[SEO Subhead]
This Guidance shows how to migrate self-managed Apache Cassandra clusters to the fully-managed Amazon Keyspaces service using the open-source CQLReplicator tool developed by AWS Solutions Architects. The CQLReplicator tool enables near real-time data migration by initiating two AWS Glue jobs—a Discovery job and a Replicator job. The Discovery job collects and stores the latest primary keys from the Cassandra source. The Replicator job scans the Amazon Keyspaces ledger, queries the Cassandra source, and inserts the latest data into the Amazon Keyspaces table. By using this tool, you can reduce your operational overhead by offloading your Cassandra clusters to AWS, achieve centralized monitoring through the integration with Amazon CloudWatch, and simplify the migration experience due to the automations provided by the CQLReplicator.
Note: [Disclaimer]
Architecture Diagram
data:image/s3,"s3://crabby-images/61614/61614e1414b704d648ff23505c4b42a2a3ddfbe9" alt=""
[Architecture diagram description]
Step 1
Initiate the CQLReplicator in AWS CloudShell, which creates two AWS Glue jobs called Discovery and Replicator.
Get Started
data:image/s3,"s3://crabby-images/61614/61614e1414b704d648ff23505c4b42a2a3ddfbe9" alt=""
Deploy this Guidance
Well-Architected Pillars
data:image/s3,"s3://crabby-images/61614/61614e1414b704d648ff23505c4b42a2a3ddfbe9" alt=""
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
AWS Glue automates your extract, transform, and load (ETL) processes, reducing the need for manual setup and management, while Amazon Keyspaces offloads database administration tasks, allowing your users to focus on application development. Integrated logging and monitoring capabilities in both services support efficient troubleshooting and issue resolution, enhancing operational excellence by streamlining operations and improving reliability.
-
Security
AWS Glue uses AWS Key Management Service (AWS KMS) to encrypt data at rest and TLS to secure data in transit. AWS Identity and Access Management (IAM) policies enable granular access control, allowing only authorized users access. AWS CloudTrail and CloudWatch provide logging and monitoring for comprehensive visibility into activities and resource usage, aiding in compliance and auditing. These features collectively support robust security for your ETL processes.
-
Reliability
Amazon Keyspaces is a fully managed and highly available NoSQL database service. It eliminates the need for manual infrastructure management, cross-Region replication, and provides built-in security features such as encryption and continuous backups. These features allow for seamless and secure operations for your users without the complexity of managing Apache Cassandra.
-
Performance Efficiency
Amazon Keyspaces delivers low-latency, single-digit millisecond response times with tunable consistency levels and optimized Cassandra Query Language (CQL) capabilities. AWS Glue automates data preparation and integration tasks, dynamically scales resources for ETL jobs, and offers a serverless architecture with a built-in data catalog for expedited dataset discovery. Collectively, these services streamline data workflows for efficient, high-performing operations without the need for extensive manual intervention.
-
Cost Optimization
The use of Amazon S3 and Amazon Keyspaces services adheres to a pay-as-you-go pricing model so you only incur costs for the storage and throughput consumed. The tiered storage classes of Amazon S3 automatically transition data to lower-cost storage based on access patterns, thereby reducing expenses for infrequently accessed data. Furthermore, the serverless architecture of Amazon Keyspaces eliminates the need for provisioning and managing servers, further lowering operational costs. Collectively, these services provide a cost-effective approach for scalable storage and efficient data management without the overhead of maintaining hardware infrastructure.
-
SustainabilityAWS Lambda functions are architected upon a serverless model, thereby optimizing resource allocation and reducing the need to maintain physical hardware infrastructure. Furthermore, Lambda is only triggered in response to changes in the data of the base table, minimizing the compute resource run times.
Related Content
data:image/s3,"s3://crabby-images/61614/61614e1414b704d648ff23505c4b42a2a3ddfbe9" alt=""
[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.