Jump to: navigation, search

Migrate data from Cassandra database versions

Cassandra versions 2.x and higher do not support backward compatibility with Cassandra versions 1.x. The data migration is required when upgrading Feature Server's Cassandra database backend from versions 1.x to versions 2.x and/or 3.x. Feature Server release 8.1.202.02 includes the following Python scripts for migrating data from Cassandra database version 1.x to versions 2.x and 3.x:

  • copyKeyspaceSchema.py—Creates a keyspace and its column families in the destination Cassandra cluster.
  • copyKeyspaceColumnFamilies.py—Copies content of source keyspace column families to the destination keyspace column families.


Pre-requisites for data migration

The following are the pre-requisites for the data migration from versions 1.x to versions 2.x and/or 3.x.

  • Destination Cassandra cluster must be deployed and all the nodes must be up and running.
  • In terms of Feature Server deployment, the destination Cassandra cluster must be deployed in external mode.
  • The destination Cassandra cluster must not have any Feature Servers assigned to it before the copying of data from the source Cassandra cluster is completed.


Run Cassandra database migration scripts

The following steps show how to migrate data from Cassandra v1.x to v2.x and v3.x

  1. Deploy the Python scripts.
  2. Run the Python scripts.
  3. Connect Feature Server nodes to migrated Cassandra cluster.

Deploy the Python scripts

  1. Install Python 2.7.5 32-bit version and Pycassa libraries on the destination Cassandra host where the scripts must be run.
  2. The Python scripts copyKeyspaceSchema.py, copyKeyspaceColumnFamilies.py and the sample json input file, copyKeyspaceInput.json are present in the Python utilities folder of Feature Server deployment: FS installation path/Python/util/. Copy these script files to a directory on the destination Cassandra host.
  3. Navigate to the directory location and run the scripts.

Run the Python scripts

Following is a sample copyKeyspaceInput.json input json file:

{"sourceHostPort": "FsNode01:9160",
 "destinationHostPort": "CassNode01:9160",
 "replicationStrategyClassName": "NetworkTopologyStrategy", 
 "replicationOptions": {"DC1": "2", "DC2": "2"},
 "sourceKeyspace": "sipfs",
 "destinationKeyspace": "sipfs",
 "excludedCFs": [],
 "includedCFs": [] }

Copy keyspace schema

The following steps show the procedure to copy the keyspace schema:

  1. Verify that the input json file has the following parameters:

  2. Parameters

    Description

    Sample

    Mandatory

    sourceHostPort

    Host and the Thrift port of source Cassandra DB in the URL format: host IP:port

    FsNode01:9160

    Yes

    destinationHostPort

    Host and the Thrift port of destination Cassandra database in the URL format: host IP:port

    CassNode01:9160

    Yes

    sourceKeyspace

    Name of the source keyspace

    sipfs

    Yes

    destinationKeyspace

    Name of the destination keyspace

    sipfs

    Yes

    replicationStrategyClassName

    Replication Strategy Class Name

    NetworkTopologyStrategy

    Yes

    replicationOptions

    Replication Options for the destination keyspace


    Ensure to configure this value according to the cassandra-toplogy.properties file.

    {"DC1": "2", "DC2": "2"}

    Yes


  3. Run the copyKeyspaceSchema.py script.

  4. Sample command line
    python ./copyKeyspaceSchema.py -i ./copyKeyspaceInput.json -o ./copyKeyspaceSchema_`date +%y%m%d-%H:%M`.log

Copy keyspace column families

  1. Verify that the input json file has the following parameters:
  2. Parameters

    Description

    Sample

    Mandatory

    sourceHostPort

    Host and the Thrift port of source Cassandra database in the URL format: host IP:port

    FsNode01:9160

    Yes

    destinationHostPort

    Host and the Thrift port of destination Cassandra database in the URL format: host IP:port

    CassNode01:9160

    Yes

    sourceKeyspace

    Name of the source keyspace

    sipfs

    Yes

    destinationKeyspace

    Name of the destination keyspace

    sipfs

    Yes

    excludedCFs

    List of comma-separated column family names to be excluded from copying while running the copyKeyspaceColumnFamilies.py script.

    message_bytes, device

    No

    includedCFs

    List of comma-separated column family names to be copied while running the copyKeyspaceColumnFamilies.py script.

    message_bytes, device

    No

    If one or more source column families contain huge volumes of data, then run the copyKeyspaceColumnFamilies.py script to copy these column families separately from the rest of the source column families. Use the excludedCFs and includedCFs parameters to exclude or include a specific column family. When the includedCFs list is not empty, the excludedCFs parameter is ignored and only the column families in the includedCFs list are copied.

    For example, provide the following json file as the input to the copyKeyspaceColumnFamilies.py script to copy the content of all column families except message_bytes column family.
    {"sourceHostPort": "FsNode01:9160",
     "destinationHostPort": "CassNode01:9160",
     "replicationStrategyClassName": "NetworkTopologyStrategy", 
     "replicationOptions": {"DC1": "2", "DC2": "2"},
     "sourceKeyspace": "sipfs",
     "destinationKeyspace": "sipfs",
     "excludedCFs": [ “message_bytes” ],
     "includedCFs": [ ] }
    For example, provide the following json file as input to the copyKeyspaceColumnFamilies.py script to copy the content of only the message_bytes column family.
    {"sourceHostPort": "FsNode01:9160",
     "destinationHostPort": "CassNode01:9160",
     "replicationStrategyClassName": "NetworkTopologyStrategy", 
     "replicationOptions": {"DC1": "2", "DC2": "2"},
     "sourceKeyspace": "sipfs",
     "destinationKeyspace": "sipfs",
     "excludedCFs": [],
     "includedCFs": [ “message_bytes” ] }
  3. Run the copyKeyspaceColumnFamilies.py script.

  4. Sample command line
    python ./copyKeyspaceColumnFamilies.py -i ./copyKeyspaceInput.json -o ./copyKeyspaceContent_`date +%y%m%d-%H:%M`.log

    Important
    If there are regional keyspaces to be copied, all the keyspaces, the global keyspace and all regional keyspaces must be copied one after the other. To copy all keyspaces, the scripts must be run for each keyspace: the global keyspace and each regional keyspace.

Connecting Feature Server nodes to migrated Cassandra cluster

The following steps should be performed for every Feature Server node involved:

  1. Stop Feature Server node.
  2. Edit <FS installation path>\launcher.xml file and set the property startCassandra to False.
  3.  

    <parameter name="startCassandra" displayName="com.genesyslab.common.application.cassandraServer" hidden="true" mandatory="false">
    <description><![CDATA[ Start Cassandra Server]]></description>
    <valid-description><![CDATA[]]></valid-description>
    <effective-description/>
    <format type="string" default="false"/>
    <validation>
    </validation>
    </parameter>
  4. Update the [Cassandra] section of the Feature Server application as shown in the following table:
  5. [Cassandra] section Option Default Value Feature Server Application Value Mandatory

    nodes

    NA

    Configure all the Cassandra nodes IP addresses that belong to the data center where Feature Server is installed.

    Yes

    nodeFailureTolerance

    Replication factor of Feature Server data center is 1.


    If the regional keyspace is used, then the least value (keyspace, regional keyspace) replication_factor of its data center is 1.


    For example, if the DC1 contains 4 nodes and the replication_factor for the global keyspace is 3 and the regional keyspace is 2, then the value is 1.

    No

    keyspace

    sipfs

    Name of the 'global' keyspace

    This option must have the same value as the keyspace name parameter for the copyKeyspaceSchema.py script when copying the global keyspace.

    No

    replicationStrategyClassName

    NA

    This option must have the same value as the replication options parameters for the copyKeyspaceSchema.py script when copying both the global keyspace and the regional keyspace values.

    Yes

    replicationOptions

    NA

    This option must have the same value as the replication options parameters for the copyKeyspaceSchema.py script.

    Yes

    regionalKeyspace

    sipfs_<region>

    Name of the regional keyspace

    This option must have the same value as the replication options parameters for the copyKeyspaceSchema.py script when copying the regional keyspace.

    Mandatory if regional keyspace(s) is enabled.

    regionalReplicationOptions

    NA

    This option must have the same value as the replication options parameters for the copyKeyspaceSchema.py script.

    Mandatory if regional keyspace(s) is enabled.

    username

    cassandra

    Cassandra Username

    Mandatory if authentication is enabled in Cassandra Cluster.

    password

    cassandra

    Cassandra Password

    Mandatory if authentication is enabled in Cassandra Cluster.

  6. Start Feature Server node.

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on October 25, 2017, at 20:07.