Guidance for Continuous Data Migration from Apache Cassandra to Amazon Keyspaces
Overview
How it works
These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.
Deploy with confidence
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
Amazon MSK is a fully managed Apache Kafka service that automates complex administrative tasks like setup, scaling, and patching. By managing Kafka Connect connectors directly within the Amazon MSK service, operations are not only automated but also optimized for handling high-volume data streams with minimal downtime, supporting continuous improvement and operational resilience. Moreover, Amazon CloudWatch can be used to monitor the published metrics from Amazon MSK to quickly identify and troubleshoot any issues that arise. This monitoring capability allows for quick detection of anomalies and performance bottlenecks, making it easier to maintain system reliability and meet your service level agreements.
Security
By configuring a combination of PrivateLink, Amazon VPC, and Amazon VPC endpoints, a set of services is established that work in tandem to help ensure that all data transfers occur within the private AWS network. This setup minimizes potential attack vectors by keeping critical infrastructure off the public internet and restricting access to trusted entities only. Specifically, PrivateLink facilitates secure data transmission within AWS, while Amazon VPC helps facilitate both Amazon MSK clusters. Apache Cassandra Amazon EC2 instances operate in a secure, isolated network environment, accessible only through specific, controlled points. In addition, Amazon VPC endpoints for Amazon Keyspaces allow secure, private connectivity between those services, removing the need to use public URLs. Lastly, AWS Identity and Access Management (IAM) roles provide fine-grained access control so that only authorized users and systems can access specific AWS resources.
Reliability
Amazon MSK is a resilient streaming service, automatically managing data replication and failover within its Kafka brokers across multiple Availability Zones (AZs) for message handling. Additionally, Amazon Keyspaces enhances data availability through automatic three-way replication across three AZs within an AWS Region. Amazon EC2 instances hosting Apache Cassandra are deployed across private subnets within different AZs through Amazon VPC, distributing resources to mitigate risks from single points of failure. Lastly, PrivateLink specifically secures data transfers to Amazon Keyspaces for reliable and protected data flow without exposure to the public internet.
Performance Efficiency
Amazon Keyspaces is a service with managed, serverless database capabilities that automatically provide the capacity to match the demand of incoming writes from Amazon MSK, providing efficient processing without latency issues. This automation supports consistent performance even during high volumes of writes. Amazon Keyspaces also offers workload isolation at the table level so that the performance of one table is not affected by the workload of another table. This feature supports predictable performance across different tables by maintaining dedicated resources for each table.
Cost Optimization
Amazon MSK is a fully managed Kafka service that removes the need for manual provisioning and management of Kafka clusters, thus minimizing operational overhead and reducing resource waste. Amazon Keyspaces eliminates the need for you to invest in hardware upfront. You can offload essential operational tasks such as provisioning, patching, and managing servers, as well as installing, maintaining, and operating database software, to AWS.
Sustainability
With Amazon Keyspaces, you can choose on-demand or provisioned capacity mode so you can optimize the use of reads and writes based on your traffic patterns, preventing the over-provisioning of your resources. This efficient use of infrastructure conserves resources and reduces energy waste.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages