Jump to: navigation, search

Real-Time Search using Lookup Tables

Overview

The first way to read data in the tables is to load a row by its PRIMARY KEY. In Cassandra Query Language (CQL), secondary indices exist but are not a best practice because of their cost.

Because we need more ways to find the stored data, we create Lookup tables that allow us to retrieve the objects we store, in the way we want.

For instance, in addition to the table Contact, we create a table named ContactEmailAddress_lookup that enables us to retrieve a Contact by using their email address.

Lookup types

Depending on the type of data, there are three different types of lookups:

  • Unique lookups
  • Multiple lookups
  • Regular lookups

Unique lookups

When we want to ensure that the data is unique, we define a unique lookup:

  • The json file describing the lookup is defined with "lookup_type":is_unique: true.
  • The column that stores the object ID is defined with columntype: regular which is the default column type.
  • The other columns are defined with columntype: partitionKey.

Inserting into unique lookups is done with INSERT IF NOT EXISTS statements, so there is only one record with this key. Any attempt to insert another row with the same key will fail.

Multiple lookups

When we don't need to ensure that the data is unique, we define a multiple lookup:

  • The json file describing the lookup is defined with "lookup_type": multiple.
  • The column that stores the object ID is defined with columntype: clusteringKey so that it is part of the primary key.
  • The other columns are defined with columntype: partitionKey.

Inserting into mutliple lookups is done with a regular INSERT statement, because there is no need to fail if another record exists with the same key. In any case, the partition key contains the object ID, so there is no possibility of duplication.

Regular lookups

When the data is usually unique but can be duplicated (such as a Phone Number, which can be shared by several persons), we define a regular lookup:

  • The json file describing the lookup is defined with "lookup_type": "regular"
  • The column that stores the object ID is defined with "columntype": "regular" which is the default type.
  • One or more columns are defined with "columntype": "partitionKey" and one or more columns are defined with "columntype": "clusteringKey".

Inserting into regular lookups is done with several INSERT IF NOT EXISTS statements, but the insertion of the contact itself does not fail if a record already exists. When further searching with the lookup, this new Contact or an existing one will be returned according to the available parameters.

Lookup behavior

For each operation on the object table, there are corresponding operations to conduct on the lookup tables. For instance, when inserting a contact into the Contact table with an email address, we need to insert the corresponding row in the ContactEmailAddress lookup table, and delete it when deleting the contact.

Inserting lookup records

The names of the columns in the lookup table correspond to the names of the columns in the main table. Some columns in the lookup tables can correspond to inner values; for instance, the EmailAddress attribute in the Contact table is not directly stored in a column, but is in the Attribute field.

A row must be created in the lookup table if the object has a value for each field of the lookup table. If a contact has no email address, then we do not need to create any record in the lookup table.

Creating an object

The first thing to do when inserting an object is to create the corresponding rows in the unique lookups. This is done with INSERT IF NOT EXISTS statements so that inserting the object fails if there is a conflict in one of the unique lookups.

Because these IF NOT EXISTS statements correspond to different tables, these requests cannot be done by a BATCH statement—they have to be done one by one.

When dealing with several unique lookups, we may face the situation where we successfully inserted a record in one lookup table, but failed to insert the record in the second table. This is why the LookupManager creates insert/delete pairs of statements. When the second insert fails, we are able to rollback the insert in the first table.

If all the unique lookups rows were successfully created, then we insert them into the multiple lookups and the main object table, this time in a BATCH.

Deleting an object

All the DELETE statements (from unique and multiple lookups and from the main table) are done in BATCH statements.

Modifying an object

An object is a collection of attributes. Modifying an object can be considered as deleting and adding attributes. The modification of an attribute is processed as deleting the previous value and inserting the new value.

Deleting an attribute

When deleting an attribute, the lookup tables that contain this attribute are affected, and the corresponding rows must be deleted. If the deletion succeeds, then we delete the attribute from the main table.

Adding an attribute

When adding an attribute to an object, we follow the same algorithm as the one for creating an object, because the insertion into the unique lookups can possibly fail.

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on September 14, 2018, at 08:14.