Guidance for Near Real-Time Data Migration from Apache Cassandra to Amazon Keyspaces
Overview
How it works
These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.
Deploy with confidence
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
AWS Glue automates your extract, transform, and load (ETL) processes, reducing the need for manual setup and management, while Amazon Keyspaces offloads database administration tasks, allowing your users to focus on application development. Integrated logging and monitoring capabilities in both services support efficient troubleshooting and issue resolution, enhancing operational excellence by streamlining operations and improving reliability.
Security
AWS Glue uses AWS Key Management Service (AWS KMS) to encrypt data at rest and TLS to secure data in transit. AWS Identity and Access Management (IAM) policies enable granular access control, allowing only authorized users access. AWS CloudTrail and CloudWatch provide logging and monitoring for comprehensive visibility into activities and resource usage, aiding in compliance and auditing. These features collectively support robust security for your ETL processes.
Reliability
Amazon Keyspaces is a fully managed and highly available NoSQL database service. It eliminates the need for manual infrastructure management, cross-Region replication, and provides built-in security features such as encryption and continuous backups. These features allow for seamless and secure operations for your users without the complexity of managing Apache Cassandra.
Performance Efficiency
Amazon Keyspaces delivers low-latency, single-digit millisecond response times with tunable consistency levels and optimized Cassandra Query Language (CQL) capabilities. AWS Glue automates data preparation and integration tasks, dynamically scales resources for ETL jobs, and offers a serverless architecture with a built-in data catalog for expedited dataset discovery. Collectively, these services streamline data workflows for efficient, high-performing operations without the need for extensive manual intervention.
Cost Optimization
The use of Amazon S3 and Amazon Keyspaces services adheres to a pay-as-you-go pricing model so you only incur costs for the storage and throughput consumed. The tiered storage classes of Amazon S3 automatically transition data to lower-cost storage based on access patterns, thereby reducing expenses for infrequently accessed data. Furthermore, the serverless architecture of Amazon Keyspaces eliminates the need for provisioning and managing servers, further lowering operational costs. Collectively, these services provide a cost-effective approach for scalable storage and efficient data management without the overhead of maintaining hardware infrastructure.
Sustainability
AWS Lambda functions are architected upon a serverless model, thereby optimizing resource allocation and reducing the need to maintain physical hardware infrastructure. Furthermore, Lambda is only triggered in response to changes in the data of the base table, minimizing the compute resource run times.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages