Skip to main content

Guidance for Optimizing Data Architecture for Sustainability on AWS

Overview

This Guidance demonstrates how you can optimize a data architecture for sustainability on AWS that helps to maximize efficiency and reduce waste. Included are curated data services and best practices that help you identify the right solution for your workloads, so you can build a more efficient, end-to-end modern data architecture in the cloud. With a comprehensive set of data and analytics capabilities, this Guidance helps you design a data strategy that grows with your business.

How it works

These steps provide an overview of this architecture. For diagrams highlighting different aspects of this architecture, open the tabs below.

Data ingestion

This diagram shows a real-time and batch data ingestion pattern, and a database replication pattern with recommended AWS services that serve these capabilities.

Data storage

This diagram shows the storage layer with frequently accessed data stores for operational use, and two popular storage patterns for analytics use – the data lake and the data warehouse.

Data processing

Managed services

This diagram shows the data processing layers with different AWS services that could be used to process data in real-time or in batch processing mode. Use either managed services (option 1) or self-managed (option 2).

Diagram illustrating an option for a sustainable data architecture using AWS managed services. The diagram shows the flow from data streams and data storage through managed data processing services (Amazon Kinesis Data Analytics, AWS Lambda, AWS Glue, Amazon EMR), leading to query, visualization, and data consumers.

Self-managed

This diagram shows the data processing layers with different AWS services that could be used to process data in real-time or in batch processing mode. Use either managed services (option 1) or self-managed (option 2).

Diagram illustrating a self-managed data processing workflow using Amazon EC2 with auto-scaling, showing data streams and storage feeding into processing, followed by query and visualization for data consumers.

Data consumers

This diagram shows the data query and visualization layer with different AWS services that helps users to query and visualize data

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

To swiftly respond to incidents and events, customize Amazon CloudWatch metrics, alarms, and dashboards. This service allows you to monitor the operational health of the Guidance and notify operators of faults.

Read the Operational Excellence whitepaper

Resources deployed by this Guidance are protected by AWS Identity and Access Management (IAM) policies and principles. For example, authentication to services like Aurora, TimeStream, AWS IoT SiteWise, Amazon S3, and Amazon Redshift are managed by IAM. With IAM identity-based policies, administrators can set what actions users can perform, on which resources, and under what conditions.

Read the Security whitepaper

Amazon S3, Aurora, DynamoDB, and Amazon Redshift are built for data storage, backup, and recovery. We recommend using AWS Backup to back up TimeStream tables. And AWS IoT SiteWise uses the highly available and durable Amazon S3 for backups.

Read the Reliability whitepaper

This Guidance uses purpose-built services for each layer of its data architecture. For storage, it selects services based on access patterns (transactional, analytical), and frequency of access (hot, cold, archival). For data ingestion, it selects services based on data velocity (data streaming services, batch data ingestions). And for data processing, it selects services based on consumption patterns (real-time, batch). For query and visualization, it selects services based on personas (business insights consumers, data analysts, data engineers, and data scientists).

You can use proxy metrics—metrics that best quantify the effect of any changes you make with the associated resources.  Examples of proxy metrics include CPU Utilization, Memory Utilization, and Storage Utilization that you can use to measure and optimize this Guidance based on changes you make.

Read the Performance Efficiency whitepaper

This Guidance uses serverless services that reduce compute costs on data ingestion and data processing by provisioning the appropriate resources and disposing resources when processes are not running. For storage, this Guidance recommends using serverless services such as Aurora for hot data storage, as well as cost-effective and scalable services for colder layers like Amazon S3.

Read the Cost Optimization whitepaper

This Guidance uses technologies based on data access and storage patterns. For frequently accessed data, it guides you to use hot storage layers supported by Aurora, TimeStream, DynamoDB, and AWS IoT SiteWise. For lower frequency or batch consumption, it guides you to use services for colder storage layers, like Amazon S3. For specialized access patterns, like aggregations on normalized tables, it uses Amazon Redshift.

This Guidance recommends you select serverless services to reduce the chances of overprovisioning your resources. In addition, Lambda functions powered by Graviton2 are designed to deliver up to 19 percent better performance at 20 percent lower cost, leading to the additional benefit of improved environmental sustainability as a result of potential increased performance. We also recommend you review the delivery SLA to choose the appropriate patterns that reduce the consumption of resources when the resources are not needed. For example, moving to a batch ingestion pattern from real-time streaming patterns when real-time consumption is not required. Finally, it helps you to implement automation to terminate resources when not in use.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.