This Guidance helps credit unions back up application and database layers of core banking platforms for disaster recovery. Financial institutions that store data must be prepared not only for natural disasters, but also for cyberattacks, ransomware attacks, and data breaches. To keep customer’s personally identifiable information (PII) safe, credit unions must prioritize and plan for efficient disaster recovery to quickly resume continuity of business operations. This Guidance helps credit unions prepare for disaster recovery while also staying compliant with requirements such as General Data Protection Regulation (GDPR), Gramm-Leach-Bliley Act (GLBA), and Federal Financial Institutions Examination Council (FFIEC).
Architecture Diagram
Step 1
AWS Direct Connect provides transparent and resilient connectivity by connecting customer data centers to the AWS Cloud.
Step 2
AWS Database Migration Service (AWS DMS) migrates and replicates data from the on-premises data center to Amazon Relational Database Service (Amazon RDS). The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.
Step 3
Set up AWS Elastic Disaster Recovery (AWS DRS) Agents on your source servers to initiate secure data replication. With AWS DRS, you can recover your applications on AWS from your existing infrastructure.
Step 4
Use the AWS Management Console to configure replication and launch settings, monitor data replication, and launch instances for drills or recovery.
Step 5
Data is replicated to a staging area subnet in your AWS account in the AWS Region you select. The staging area design reduces costs by using Amazon Elastic Block Store (Amazon EBS) and minimal Amazon Elastic Compute Cloud (Amazon EC2) resources to maintain ongoing replication.
Step 6
AWS DRS automatically converts your servers to boot and run natively on AWS when you launch instances for drills or recovery. If you need to recover applications, you can launch recovery instances on AWS within minutes, using the most up-to-date server state or a previous point in time.
You can choose to keep your applications running on AWS or initiate replication to your primary site once the issue is resolved. You can fail back to your primary site whenever you’re ready.
Step 7
Use Amazon CloudWatch to capture, react to, and display application health. You can monitor changes to application infrastructure by using AWS CloudTrail and AWS Config. These services monitor activity within your AWS account. For application-level insights, use AWS X-Ray to monitor your application.
Step 8
With AWS Identity and Access Management (IAM), you can specify who or what can access services and resources in AWS. AWS Key Management Service (AWS KMS) lets you create, manage, and control cryptographic keys across your applications and other AWS services.
AWS Secrets Manager helps you manage, retrieve, and rotate database credentials, API keys, and other secrets throughout their lifecycles.
Step 9
Amazon Route 53 monitors the health of your application endpoints and directs traffic to your primary Region. When Route 53 detects a failure in your primary Region, it will automatically switch and route traffic to your application running in AWS.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
To maintain operational health, you must detect failures and quickly recover from them. You should configure applications to emit the relevant telemetry for detecting issues and establish processes to capture and react to events. CloudWatch provides useful tools to capture, react to, and display application health. Drift between primary and secondary sites can lead to failure in recovery during a disaster. Financial institutions can monitor changes to their application infrastructure by using CloudTrail and AWS Config. Once drift is detected, institutions can automate a response through CloudWatch Events.
-
Security
Use IAM policies to assign permissions that determine who is allowed to manage AWS resources, especially AWS DRS and AWS DMS. Communication between AWS DRS Agents and the replication server are based on Transport Layer Security (TLS) 1.2. These requests are signed using an access key ID and a secret access key that is associated with an IAM principle. As an additional step, we recommend encrypting Amazon EBS volumes using AWS KMS.
AWS DMS uses Secure Sockets Layer (SSL) for endpoint connections with TLS. You can also configure AWS DMS to use AWS KMS for encrypting the storage used by the replication instance and its endpoint connection information. Further, we recommend encrypting the Amazon RDS database using AWS KMS and storing database credentials within Secrets Manager.
-
Reliability
AWS DRS continuously replicates machines into a staging area within the target AWS account and your preferred Region. In case of a disaster, AWS DRS automates the conversion of replicated servers into fully provisioned workloads in the recovery Region. For disruptions caused by ransomware, data corruption, accidental user error, or bad patches, you can use AWS DRS to recover these servers on AWS from a previous point in time. Further, AWS DRS provides continuous block-level replication, recovery orchestration, and automated server conversion capabilities. These allow customers to achieve a crash-consistent recovery point objective (RPO) of seconds and a recovery time objective (RTO) typically ranging between 5–20 minutes.
-
Performance Efficiency
AWS DRS is highly automated, eliminating time-consuming and manual tasks. Server conversion technology makes relevant changes to the boot volume of the recovered server so that it can boot in AWS. This includes injecting appropriate hypervisor drivers and networking changes. As a managed service, AWS DMS takes care of assessing, converting, and migrating database instances into AWS. To speed up full load and improve the change data capture (CDC) process, we recommend creating separate AWS DMS tasks for tables which have a high number of records or data manipulation language (DML) activities to prevent data migration from smaller tables that can slow down tasks.
-
Cost Optimization
The AWS DRS staging area design reduces costs by using affordable storage and minimal compute resources to maintain ongoing replication. You can achieve further price reductions through Savings Plans for Amazon EC2 and Reserved Instances for Amazon RDS. These discounts provide considerable cost savings when compared to On-Demand pricing. To provide the capacity guarantee that financial institutions need for regulatory requirements, customers can purchase zonal Reserved Instances (RIs). Zonal RIs are specific to an instance type and assigned to a specific Availability Zone. Zonal RIs increase availability, regardless of other customer demands for capacity.
-
Sustainability
Through the use of managed services, this architecture minimizes the environmental impact from backend resources. The AWS DRS staging area design reduces the infrastructure carbon footprint by provisioning minimal compute resources to maintain ongoing replication while still achieving RTOs of minutes and RPOs of seconds. To further reduce environmental impact, continuously monitor CloudWatch metrics during disaster events to help ensure that the scaled environment is not overprovisioned.
Implementation Resources
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.