This Guidance demonstrates how you can optimize a data architecture for sustainability on AWS that helps to maximize efficiency and reduce waste. Included are curated data services and best practices that help you identify the right solution for your workloads, so you can build a more efficient, end-to-end modern data architecture in the cloud. With a comprehensive set of data and analytics capabilities, this Guidance helps you design a data strategy that grows with your business.
Please note: [Disclaimer]
Architecture Diagram

Overview
These steps provide an overview of this architecture. For diagrams highlighting different aspects of this architecture, open the accordion dropdown options.
Step 1
Organizations ingest data from streaming sources like sensors, devices, social media, or web applications, and in batches from database and file systems.
-
Data Ingestion
This diagram shows a real-time and batch data ingestion pattern, and a database replication pattern with recommended AWS services that serve these capabilities.
-
Steps
-
Additional considerations
-
Steps
-
Follow the steps in this diagram to deploy this Guidance.
Step 1
Use managed services such as Amazon Kinesis Data Streams for streaming data, such as clickstream data from web applications. Managed services distribute the environmental impact of the service across many users because of the multi-tenant control planes. -
Additional considerations
-
Consider the following key components when deploying this Guidance.
Consideration A
Evaluate the service-level agreement (SLA) for data delivery and assess whether a continuous stream is necessary.
-
-
Data Storage
This diagram shows the storage layer with frequently accessed data stores for operational use, and two popular storage patterns for analytics use – the data lake and the data warehouse.
-
Steps
-
Additional considerations
-
Steps
-
Follow the steps in this diagram to deploy this Guidance.
Step 1
Use the managed database of AWS IoT SiteWise for storing data from industrial equipment at scale. Set a retention period for how long your data is stored in the hot tier before it's deleted. Move historical data to colder tiers in Amazon Simple Storage Service (Amazon S3). -
Additional considerations
-
Consider the following key components when deploying this Guidance.
Consideration A
For hot data stores where data is queried frequently, choose the right database service for the right purpose to improve resource efficiency of your workloads.
-
-
Data Processing
This diagram shows the data processing layers with different AWS services that could be used to process data in real-time or in batch processing mode. Use either managed services (option 1) or self-managed (option 2) as shown in subsequent slides.
-
Managed Services
-
Managed Services - Additional considerations
-
Self-Managed
-
Managed Services
-
Follow the steps in this diagram to deploy this Guidance.
Step 1
Use managed services like Amazon Managed Service for Apache Flink to reduce overhead of managing infrastructure and risk of overprovisioning resources. Select the appropriate length for time-based windowing operations for streaming applications to reduce wastage of resources. -
Managed Services - Additional considerations
-
Consider the following key components when deploying this Guidance.
Consideration A
Use predicate pushdown to reduce the amount of data moved between different layers during data processing. Implement an event-driven architecture to maximize overall resource utilization for asynchronous workloads. -
Self-Managed
-
Consider the following key components when deploying this Guidance.
Step 1
Use Amazon EC2 instances to build your own analytics application that is best suited for your business requirements. Run petabyte-scale data processing jobs on open-source analytics frameworks such as Apache Airflow, Apache Hive, Apache Kafka, Apache Flink, Apache Spark, Presto, and Trino, among others.
-
-
Data Consumers
This diagram shows the data query and visualization layer with different AWS services that helps users to query and visualize data
-
Steps
-
Additional considerations
-
Steps
-
Follow the steps in this diagram to deploy this Guidance.
Step 1
Use Amazon OpenSearch Service for real-time visualization. Amazon OpenSearch Serverless removes much of the complexity of managing OpenSearch clusters and capacity. -
Additional considerations
-
Consider the following key components when deploying this Guidance.
Consideration A
Consider reviewing reports and dashboard usage at regular intervals. Remove unused, redundant reports from dashboards.
-
Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
To swiftly respond to incidents and events, customize Amazon CloudWatch metrics, alarms, and dashboards. This service allows you to monitor the operational health of the Guidance and notify operators of faults.
-
Security
Resources deployed by this Guidance are protected by AWS Identity and Access Management (IAM) policies and principles. For example, authentication to services like Aurora, TimeStream, AWS IoT SiteWise, Amazon S3, and Amazon Redshift are managed by IAM. With IAM identity-based policies, administrators can set what actions users can perform, on which resources, and under what conditions.
-
Reliability
Amazon S3, Aurora, DynamoDB, and Amazon Redshift are built for data storage, backup, and recovery. We recommend using AWS Backup to back up TimeStream tables. And AWS IoT SiteWise uses the highly available and durable Amazon S3 for backups.
-
Performance Efficiency
This Guidance uses purpose-built services for each layer of its data architecture. For storage, it selects services based on access patterns (transactional, analytical), and frequency of access (hot, cold, archival). For data ingestion, it selects services based on data velocity (data streaming services, batch data ingestions). And for data processing, it selects services based on consumption patterns (real-time, batch). For query and visualization, it selects services based on personas (business insights consumers, data analysts, data engineers, and data scientists).
You can use proxy metrics—metrics that best quantify the effect of any changes you make with the associated resources. Examples of proxy metrics include CPU Utilization, Memory Utilization, and Storage Utilization that you can use to measure and optimize this Guidance based on changes you make.
-
Cost Optimization
This Guidance uses serverless services that reduce compute costs on data ingestion and data processing by provisioning the appropriate resources and disposing resources when processes are not running. For storage, this Guidance recommends using serverless services such as Aurora for hot data storage, as well as cost-effective and scalable services for colder layers like Amazon S3.
-
Sustainability
This Guidance uses technologies based on data access and storage patterns. For frequently accessed data, it guides you to use hot storage layers supported by Aurora, TimeStream, DynamoDB, and AWS IoT SiteWise. For lower frequency or batch consumption, it guides you to use services for colder storage layers, like Amazon S3. For specialized access patterns, like aggregations on normalized tables, it uses Amazon Redshift.
This Guidance recommends you select serverless services to reduce the chances of overprovisioning your resources. In addition, Lambda functions powered by Graviton2 are designed to deliver up to 19 percent better performance at 20 percent lower cost, leading to the additional benefit of improved environmental sustainability as a result of potential increased performance. We also recommend you review the delivery SLA to choose the appropriate patterns that reduce the consumption of resources when the resources are not needed. For example, moving to a batch ingestion pattern from real-time streaming patterns when real-time consumption is not required. Finally, it helps you to implement automation to terminate resources when not in use.
Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content

Optimize Data Pattern using Amazon Redshift Data Sharing
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.