Cassandra Node Maintenance
You can upgrade your Cassandra version without interrupting service if:
- The version you are upgrading to is in the same stream (for example, from 3.11.2 version to 3.11.3).
- You are not changing your database schema.
Use the following steps for this task:
- Stop the first Cassandra seed node.
- Backup your database storage file.
- Upgrade your Cassandra version, following the instructions in the Release Notes for the new version.
- Start the first Cassandra seed node.
- Execute steps 1 through 5 for the other seed nodes.
- Execute steps 1 through 5 for the other non‐seed nodes.
- Verify that the Cassandra cluster is working.
- You may delete backup files.
Genesys recommends that you use the nodetool utility that is bundled with your Cassandra installation package and that you make a habit of using the following nodetool commands to monitor the state of your Cassandra cluster.
Displays node status and information about the cluster, as determined by the node being queried. This can give you an idea of the load balance and whether any nodes are down. If your cluster is not properly configured, different nodes may show a different cluster; this is a good way to check that every node views the cluster the same way.
nodetool -h <HOST_NAME> -p <JMX_PORT> ring
Displays cluster information.
nodetool -h <HOST_NAME> -p <JMX_PORT> status
Displays compaction statistics.
nodetool -h <HOST_NAME> -p <JMX_PORT> compactionstats
getcompactionthroughput \ setcompactionthroughput
Displays the compaction throughput on the selected Cassandra instance. By default it is 32 MB/s.
You can increase this parameter if you observe permanent growth of database size after the TTL and grace periods are passed. Note that increasing compaction throughput will affect memory and CPU consumption. Because of this, you need make sure to have sufficient hardware to support the rate that you have selected.
nodetool -h <HOST_NAME> -p <JMX_PORT> getcompactionthroughput
To increase compaction throughput to 64 MB/s, for example, use the following command:
nodetool -h <HOST_NAME> -p <JMX_PORT> setcompactionthroughput 64
With its distributed architecture, Cassandra is fault-tolerant, and the integrity of data is guaranteed, even in the event of an entire site failure.
However, Cassandra also provides the ability to take a snapshot. Snapshots are not designed for data integrity. However they can be used as a backup—for example, to retrieve data that were accidentally deleted.
Instead of dedicating a machine for the storage of snapshots, you should rather use that machine to install one more redundant Cassandra node.
See Backing up and restoring data in the Cassandra documentation.
There is also an incremental backup that is quicker than a full backup. Best practice documentation as recommended by Datastax Enterprise is downloadable here.
Depending on the replication factor and consistency levels of a Cassandra cluster configuration, the UCS Cluster can handle the failure of one or more Cassandra nodes in the data center without any special recovery procedures and without interrupting service or losing functionality. When the failed node is back up, the UCS Cluster automatically reconnects to it.
- Therefore, if an eligible number of nodes have failed, you should just restart them.
However, if too many of the Cassandra nodes in your cluster have failed or stopped, you will lose functionality. To ensure a successful recovery from failure of multiple nodes, Genesys recommends that you:
- Stop every node, one at a time, with at least two minutes between operations.
- Then restart the nodes one at a time, with at least two minutes between operations.