This Guidance helps you modernize your record retention to extract value from your data, while staying compliant with record-keeping rules from the U.S. Securities and Exchange Commission (SEC), Commodity Futures Trading Commission (CFTC), and the Financial Industry Regulatory Authority (FINRA). Financial service institutions (FSIs) are expected to have compliant record retention. FSIs often satisfy record retention requirements by using on-premises legacy storage solutions which do not scale, require constant hardware and software refreshes, and do not allow end-customers to easily access the data. With this Guidance, you can use cloud-native services for storing, processing, and monitoring access to data, so analysts, data scientists, and other stakeholders can work with the data while staying in compliance with regulators.
Please note: [Disclaimer]
Architecture Diagram
Step 1
Transaction data is created in the line of business applications.
Step 2
AWS DataSync, AWS Transfer Family, or AWS Snowball transfer data to an AWS Region.
Step 3
Amazon Simple Storage Service (Amazon S3) stores data in its raw form.
Step 4
AWS Glue crawlers discover and catalog the raw data.
Step 5
Customers can process the raw data using AWS Glue Studio jobs or Amazon EMR.
Step 6
Amazon DynamoDB stores job details, results, and other metadata for auditing purposes.
Step 7
AWS Glue Data Catalog stores processed data schema and partition information.
Step 8
S3 buckets store processed data for retention, configured with S3 Object Lock in Compliance Mode, with a default retention period that matches compliance requirements.
Step 9
AWS Lake Formation provides access control and governance, which enables granular access control on a database-, table-, or column-level.
Step 10
End users, such as the record management team, data science teams, auditors, and designated third-parties (D3P), access the data through services such as Amazon Athena, Amazon Redshift Spectrum, and Amazon SageMaker.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
This Guidance uses fully managed services, such as Amazon S3, DataSync, Transfer Family, AWS Glue, Lake Formation, and Athena. These services eliminate the need to administer data processing, data storage, and data warehousing systems, so you can focus on building your applications.
-
Security
End users use AWS Identity and Access Management (IAM) single-sign on, which authorizes access to QuickSight dashboards and the Athena user interface in addition to the Amazon Redshift Query user interface (UI) for ad-hoc queries and SageMaker for machine learning (ML) projects. DataSync uses HTTPS for encryption in-transit. Transfer Family uses secure file transfer protocol (SFTP) and file transfer protocol (FTPS), which are secured by the underlying protocols based on secure shell (SSH) and transport layer security (TLS) cryptographic algorithms. Snowball supports server-side encryption at rest. Amazon S3 supports server-side and client-side encryption.
-
Reliability
Serverless capabilities such as Athena, AWS Glue, Lake Formation, DynamoDB, Amazon Redshift Serverless, and Amazon EMR Serverless scale with demand. Transfer Family supports up to three Availability Zones to minimize network latency. Amazon EMR supports multi-master deployments in the same Availability Zone, while Amazon Redshift uses a relocation capability that allows you to move a cluster to another Availability Zone with minimal changes to your application. DataSync recovers from network path failures and uses integrity checks and full checksums to ensure correct transfer of data.
-
Performance Efficiency
With serverless services, you can use automatic scaling and recover resources, while using the minimum amount of services required for a task.
-
Cost Optimization
In this Guidance, we use serverless services that scale automatically with demand so that you pay only for the amount of resources you use. For example, AWS Glue and Amazon EMR Serverless only consume resources when jobs are running. Users pay only for the Athena queries they run, and Amazon Redshift Serverless scales with demand. Additionally, DataSync efficiently transfers data to AWS to minimize costs. Amazon EMR can make use of transient clusters and Amazon Elastic Cloud Compute (Amazon EC2) Spot instances, which provide up to a 90% discount compared to on-demand prices.
-
Sustainability
By extensively using serverless services and dynamic scaling, resources are only consumed when needed. You do not need to maintain peak capacity to avoid costly application failures when scaling resources.
Implementation Resources
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
How financial institutions modernize record retention on AWS
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.