Migrate from Apache Cassandra to Amazon Keyspaces

Introduction

In this lesson, you migrate a self-managed Apache Cassandra cluster to a fully managed cluster by using Amazon Keyspaces (for Apache Cassandra). First you learn why you would want to use Amazon Keyspaces to manage your Cassandra cluster. Then you work through the steps to migrate an existing Cassandra cluster to Amazon Keyspaces. At the end of this lesson, you should feel confident in your ability to migrate an existing Cassandra database to Amazon Keyspaces.

Time to complete: 30–45 minutes

Move to Managed Databases - Migrate from Apache Cassandra to Amazon Keyspaces (16:11)
Why use Amazon Keyspaces?

Amazon Keyspaces is a fully managed service for running Cassandra-compatible databases on Amazon Web Services (AWS). Cassandra is a popular option for high-scale applications that need top-tier performance.

With Amazon Keyspaces, your database operations are managed by AWS, leaving your team free to focus on innovation. Amazon Keyspaces handles cluster scaling, instance failover, data backups, and software updates. Rely on the efficiencies of the AWS Cloud to use a faster, cheaper, and more reliable database option.

Lesson contents

In this lesson, you learn how to migrate a self-managed Cassandra cluster to a fully managed cluster on Amazon Keyspaces. This lesson has four steps.

  • 1. (Optional) Create a source Cassandra cluster in Amazon Elastic Compute Cloud (Amazon EC2)

    In this module, you create a self-managed Cassandra database in Amazon EC2. This will serve as a source database for performing a migration to Amazon Keyspaces. You can use it to walk through the steps required to perform a migration to Amazon Keyspaces.

    If you already have a source Cassandra database that you want to migrate, you can skip this module and move on to the next module.


    If you do need to create a source Cassandra database for the migration walkthrough, go to the Amazon EC2 console. Choose Launch instance to start the Amazon EC2 instance creation wizard.

    apache-1
    (click to zoom)

    The first step in the Amazon EC2 instance creation wizard is to choose your Amazon Machine Image (AMI). Use the Amazon Linux 2 AMI with the default x86 architecture and choose Select.

    apache-2
    (click to zoom)

    On the next page, choose the instance type for your Amazon EC2 instance. For this walkthrough, use at least a t2.large instance because Cassandra requires a significant amount of memory. Choose Review and Launch to continue.

    apache-3
    (click to zoom)

    The next page shows the default options for the rest of your Amazon EC2 settings. The default options should be fine for this use case. Choose Launch to continue.

    apache-4
    (click to zoom)

    Now choose a key pair to allow SSH access to your new Amazon EC2 instance. From the dropdown, choose Create a new key pair to create a new key pair for this walkthrough. Then name your key pair cassandra-migration and choose Download Key Pair to download the key pair to your machine. Finally, choose Launch Instances to create your instance.

    apache-5
    (click to zoom)

    A confirmation page shows that your instance is launching. Choose View Instances to see your Amazon EC2 instance.

    apache-6
    (click to zoom)

    As your instance is initialized, it shows an Instance State of pending Wait until the Instance State shows running.

    apache-7
    (click to zoom)

    When the Instance State shows running, you can SSH into your instance.

    apache-8
    (click to zoom)

    Copy the IPv4 Public IP value for your instance, and then run the following commands in your terminal to SSH into your instance.

    chmod 600 /path/to/cassandra-migration.pem
    ssh -i /path/to/cassandra-migration.pem ec2-user@<IPv4PublicIP>

    Be sure to use the proper values for the path to the cassandra-migration.pem file that you downloaded and use the correct IPv4 Public IP for your instance.

    If you have difficulty connecting to your Amazon EC2 instance, see Connecting to your Linux instance using SSH.

    If you connected successfully, your terminal should look like the following.

    apache-9
    (click to zoom)

    To install Cassandra, you first need to install Java. To install Java, execute the following commands in your terminal.

    sudo amazon-linux-extras enable corretto8
    sudo yum -y install java-1.8.0-amazon-corretto-devel

    After you have installed Java, execute the following commands to install and start Cassandra.

    echo '[cassandra]
    name=Apache Cassandra
    baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://www.apache.org/dist/cassandra/KEYS' | sudo tee -a /etc/yum.repos.d/cassandra.repo > /dev/null
    sudo yum -y install cassandra
    sudo systemctl daemon-reload
    sudo service cassandra start

    After you have installed and started Cassandra, use the nodetool CLI to check the health of your Cassandra cluster. In your terminal, execute the following command.

    nodetool status

    You should see a single node in your Cassandra cluster like the following.

    apache-10
    (click to zoom)

    There is an open-source tool called tlp-stress that is used for load testing and benchmarking your Cassandra cluster. It includes a simple way to load your cluster with data that you can use to ensure that your migration works.

    To install tlp-stress, run the following commands in your terminal.

    sudo yum install -y git
    git clone https://github.com/thelastpickle/tlp-stress.git
    cd tlp-stress
    ./gradlew shadowJar

    Next, use tlp-stress to load your cluster with 10,000 time-series records for the migration. To do so, execute the following command in your terminal.

    bin/tlp-stress run BasicTimeSeries -i 10k

    You should see some output in your terminal as the command is executed. When the command has run, it will show the results of your operations.

    apache-11
    (click to zoom)

    In this module, you created a self-managed source Cassandra cluster from which you can test performing a migration to Amazon Keyspaces. In the next module, you will create a fully managed Amazon Keyspaces cluster.

  • 2. Create an Amazon Keyspaces cluster

    In this module, you create an Amazon Keyspaces cluster. This cluster will be used as your primary database after you copy your existing data into it.


    To get started, navigate to the Amazon Keyspaces console. On the Keyspaces page, choose Create keyspace to create a new keyspace.

    apache-2.1
    (click to zoom)

    In the keyspace creation wizard, give your keyspace a name.

    apache-2.2
    (click to zoom)

    You can attach tags to your keyspace to help with access control or to track billing. Then choose Create keyspace to create your keyspace.

    apache-2.3
    (click to zoom)

    At this point, you should see your keyspace in the Amazon Keyspaces console. Choose your keyspace to view more details about it.

    apache-2.4
    (click to zoom)

    Currently, your keyspace does not have any tables. Let's create the first table in your keyspace that will hold your migrated data.

    Choose Create table to open the table creation wizard.

    apache-2.5
    (click to zoom)

    Choose a name for your table. The tlp-stress tool gave your table a name of sensor_data, so use the same name here.

    apache-2.6
    (click to zoom)

    Now you need to declare the schema for your table. The tlp-stress tool added three columns to the table: sensor_id, timestamp, and data. The sensor_id and data columns are of type text, and the timestamp column is of type timeuuid. Additionally, the partition key is a combination of sensor_id and timestamp.

    Your schema should look as follows in the Amazon Keyspaces table creation wizard.

    apache-2.7
    (click to zoom)

    Finally, you can choose the Capacity mode and add any required tags. With Amazon Keyspaces provisioned capacity billing mode, you declare the amount of reads and writes you want to provision. With on-demand billing mode, you don't need to plan for the capacity required by your table. Amazon Keyspaces bills you directly for the reads and writes you consume. As part of the AWS Free Tier, you can use 30 million on-demand read units and 30 million on-demand write units per month for the first three months after you create an Amazon Keyspaces resource.

    Notice that when creating your Amazon Keyspaces table, you did not need to configure a compaction strategy, bloom filters, caches, or other common tuning parameters as you might in Cassandra. This is handled for you by Amazon Keyspaces so that your developers can focus on the quality of the data model and your required access patterns.

    apache-2.8
    (click to zoom)

    The table creation wizard shows you the Cassandra command that will be executed to create your table. When you are ready, choose Create table to create your table.

    apache-2.9
    (click to zoom)

    The Amazon Keyspaces console shows your table being created. While it is being created, your table has a Status of Creating.

    When your table is ready to use, its Status is Active.

    apache-2.10
    (click to zoom)
    apache-2.11
    (click to zoom)

    In this module, you created a keyspace and table in Amazon Keyspaces. This table is fully managed and compatible with Cassandra.

    In the next module, you will perform a migration of your existing Cassandra table to your fully managed table in Amazon Keyspaces.

  • 3. Perform a migration from an existing Cassandra table to an Amazon Keyspaces table

    In this module, you perform a migration of an existing Cassandra table to the Amazon Keyspaces table that you created in the previous module.


    Use cqlsh, the command-line tool for working with Cassandra, to assist with the migration. First, export the data from your existing table in Cassandra. Then, load the data into your new table in Amazon Keyspaces.

    Before you begin, generate service-specific credentials to connect to Amazon Keyspaces by using cqlsh. These service-specific credentials are one of the two ways you can authenticate to your Amazon Keyspaces table. Service-specific credentials are credentials tied to a specific AWS Identity and Access Management (IAM) user that are used to authenticate for a service.

    To generate service-specific credentials, navigate to the IAM console. Find the IAM user to whom you want to grant service-specific credentials and choose that user.

    apache-3.1
    (click to zoom)

    On the IAM user's page, choose the Security credentials tab.

    apache-3.2
    (click to zoom)

    Then navigate to the bottom of the page. In the section for Amazon Keyspaces, choose Generate credentials to create Amazon Keyspaces credentials for your IAM user.

    apache-3.3
    (click to zoom)

    A window is displayed with your service-specific credentials. Download these credentials and make sure you have them available because you will need them later in this module.

    apache-3.4
    (click to zoom)

    After you have downloaded your service-specific credentials, you are ready to start the migration. You will perform the migration from the Amazon EC2 instance that is hosting your self-managed Cassandra database.

    Navigate to the Amazon EC2 console, and find the Amazon EC2 instance you created in the first module in this lesson.

    Copy the IPv4 Public IP address of your instance.

    apache-3.5
    (click to zoom)

    When you have the IPv4 Public IP address, run the same command you ran in the first module to connect to your instance.

    ssh -i /path/to/cassandra-migration.pem ec2-user@<IPv4PublicIP>

    After you have connected to your instance, enter cqlsh in your terminal to enter the CQL shell.

    You can view some of your sample data by using the following command in cqlsh.

    SELECT * FROM tlp_stress.sensor_data LIMIT 5;

    You should see output similar to the following.

    apache-3.6
    (click to zoom)

    In the cqlsh tool, enter the following command to export your table to a .csv file on your Amazon EC2 instance.

    COPY tlp_stress.sensor_data TO 'sensor_data_export.csv' WITH HEADER=true;

    It should take a few seconds to complete the command. Exit the CQL shell by typing exit.

    There is a file in your current directory called sensor_data_export.csv, which contains the contents of your table. You can view some of the contents by running the following command in your terminal.

    head -n5 sensor_data_export.csv

    This prints out the header and the first four rows of data.

    Now, use cqlsh to connect to your Amazon Keyspaces table by using the service-specific credentials you created.

    First, download the Amazon digital certificate with the following command.

    curl https://www.amazontrust.com/repository/AmazonRootCA1.pem -o /home/ec2-user/.cassandra/AmazonRootCA1.pem

    Then run the following command to configure cqlsh to connect to Amazon Keyspaces.

    echo '[connection]
    port = 9142
    factory = cqlshlib.ssl.ssl_transport_factory

    [ssl]
    validate = true
    certfile = /home/ec2-user/.cassandra/AmazonRootCA1.pem' >> /home/ec2-user/.cassandra/cqlshrc

    With cqlsh configured, run the following command to connect to your keyspace by using cqlsh.

    cqlsh cassandra.us-east-1.amazonaws.com 9142 -u <user> -p <password> --ssl

    Make sure to substitute for <user> the user name and for <password> the password from your service-specific credentials.

    If the command was successful, you should be connected to your keyspace by cqlsh.

    apache-3.7
    (click to zoom)

    Switch to the keyspace you created with the USE command in cqlsh and set the write consistency level to LOCAL_QUORUM.

    USE fully_managed_keyspace;
    CONSISTENCY LOCAL_QUORUM;

    Then load your data into your table.

    COPY "sensor_data" FROM './sensor_data_export.csv' WITH HEADER=true AND INGESTRATE=1000;

    After a few seconds, all the records from your .csv file should be loaded into your Amazon Keyspaces table. You can view a few records with the following command.

    SELECT * FROM sensor_data LIMIT 5;

    Just like in your source Cassandra database, this command prints out some of the records.

    For additional details and configuration options for using cqlsh to load data into your Amazon Keyspaces table, see Loading data into Amazon Keyspaces with cqlsh.


    In this module, you exported data from a self-managed Cassandra cluster running in Amazon EC2 and imported the data into a fully managed Amazon Keyspaces table. To do this, you created service-specific credentials to be used by cqlsh, and then you executed cqlsh commands against your source database and your target Amazon Keyspaces table.

    In the next module, you will clean up your resources and learn about next steps.

  • 4. Complete the migration and clean up resources.

    If you have followed all the steps in this lesson, you have created a new, fully managed Amazon Keyspaces table and migrated your existing data from your self-managed Cassandra cluster to your Amazon Keyspaces table. In this final module, you complete the migration and clean up your resources.


    When your initial migration is complete and all data is synced to your new Amazon Keyspaces table, you are ready to use your new Amazon Keyspaces table in your application. In your application, change the configuration to use your Amazon Keyspaces table rather than your existing Cassandra database. For more information about connecting to Amazon Keyspaces using a Cassandra client, see Using a Cassandra Client Driver to Access Amazon Keyspaces Programmatically.

    After you have switched the configuration to your new Amazon Keyspaces table and are confident in the migration, you can delete your existing self-managed Cassandra database on Amazon EC2.

    To do that, navigate to the Amazon EC2 console. Find the instance that is used to host your Cassandra database cluster. Choose the instance, and then choose Instance State > Terminate in the Actions dropdown.

    apache-4.1
    (click to zoom)

    If you no longer need the keyspace and table that you created in this lesson, you should delete those as well. To do so, navigate to the Amazon Keyspaces console. Choose the keyspace you created, and then choose Delete.

    apache-4.2
    (click to zoom)

    A confirmation window is displayed before you delete the keyspace. Type "Delete" in the box, and then choose Delete keyspace to delete your keyspace.

    apache-4.3
    (click to zoom)

    The Amazon Keyspaces page shows that your keyspace is being deleted.

    apache-4.4
    (click to zoom)

    In this module, you learned how to migrate your application to use your new fully managed Amazon Keyspaces table. You also learned how to clean up the Amazon EC2 instance and the Amazon Keyspaces resources that you created in this lesson.

In this lesson, you migrated an existing, self-managed Apache Cassandra database running on Amazon EC2 to a fully managed Amazon Keyspaces table. You used tools such as cqlsh to easily migrate your data from an existing instance to your Amazon Keyspaces table.