This Guidance demonstrates how you can optimize a data architecture for sustainability on AWS that helps to maximize efficiency and reduce waste. Included are curated data services and best practices that help you identify the right solution for your workloads, so you can build a more efficient, end-to-end modern data architecture in the cloud. With a comprehensive set of data and analytics capabilities, this Guidance helps you design a data strategy that grows with your business.

Please note: [Disclaimer]

Architecture Diagram

Download the architecture diagram PDF 

Overview

These steps provide an overview of this architecture. For diagrams highlighting different aspects of this architecture, open the accordion dropdown options.

  • This diagram shows a real-time and batch data ingestion pattern, and a database replication pattern with recommended AWS services that serve these capabilities.

    • Steps
    • Follow the steps in this diagram to deploy this Guidance.

    • Additional considerations
    • Consider the following key components when deploying this Guidance.

  • This diagram shows the storage layer with frequently accessed data stores for operational use, and two popular storage patterns for analytics use – the data lake and the data warehouse.

    • Steps
    • Follow the steps in this diagram to deploy this Guidance.

    • Additional considerations
    • Consider the following key components when deploying this Guidance.

  • This diagram shows the data processing layers with different AWS services that could be used to process data in real-time or in batch processing mode. Use either managed services (option 1) or self-managed (option 2) as shown in subsequent slides.

    • Managed Services
    • Follow the steps in this diagram to deploy this Guidance.

    • Managed Services - Additional considerations
    • Consider the following key components when deploying this Guidance.

      Consideration A
      Use predicate pushdown to reduce the amount of data moved between different layers during data processing. Implement an event-driven architecture to maximize overall resource utilization for asynchronous workloads.

       

    • Self-Managed
    • Consider the following key components when deploying this Guidance.

  • This diagram shows the data query and visualization layer with different AWS services that helps users to query and visualize data

    • Steps
    • Follow the steps in this diagram to deploy this Guidance.

    • Additional considerations
    • Consider the following key components when deploying this Guidance.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • To swiftly respond to incidents and events, customize Amazon CloudWatch metrics, alarms, and dashboards. This service allows you to monitor the operational health of the Guidance and notify operators of faults.

    Read the Operational Excellence whitepaper 
  • Resources deployed by this Guidance are protected by AWS Identity and Access Management (IAM) policies and principles. For example, authentication to services like Aurora, TimeStream, AWS IoT SiteWise, Amazon S3, and Amazon Redshift are managed by IAM. With IAM identity-based policies, administrators can set what actions users can perform, on which resources, and under what conditions.

    Read the Security whitepaper 
  • Amazon S3, Aurora, DynamoDB, and Amazon Redshift are built for data storage, backup, and recovery. We recommend using AWS Backup to back up TimeStream tables. And AWS IoT SiteWise uses the highly available and durable Amazon S3 for backups.

    Read the Reliability whitepaper 
  • This Guidance uses purpose-built services for each layer of its data architecture. For storage, it selects services based on access patterns (transactional, analytical), and frequency of access (hot, cold, archival). For data ingestion, it selects services based on data velocity (data streaming services, batch data ingestions). And for data processing, it selects services based on consumption patterns (real-time, batch). For query and visualization, it selects services based on personas (business insights consumers, data analysts, data engineers, and data scientists).

    You can use proxy metrics—metrics that best quantify the effect of any changes you make with the associated resources.  Examples of proxy metrics include CPU Utilization, Memory Utilization, and Storage Utilization that you can use to measure and optimize this Guidance based on changes you make.

    Read the Performance Efficiency whitepaper 
  • This Guidance uses serverless services that reduce compute costs on data ingestion and data processing by provisioning the appropriate resources and disposing resources when processes are not running. For storage, this Guidance recommends using serverless services such as Aurora for hot data storage, as well as cost-effective and scalable services for colder layers like Amazon S3.

    Read the Cost Optimization whitepaper 
  • This Guidance uses technologies based on data access and storage patterns. For frequently accessed data, it guides you to use hot storage layers supported by Aurora, TimeStream, DynamoDB, and AWS IoT SiteWise. For lower frequency or batch consumption, it guides you to use services for colder storage layers, like Amazon S3. For specialized access patterns, like aggregations on normalized tables, it uses Amazon Redshift.

    This Guidance recommends you select serverless services to reduce the chances of overprovisioning your resources. In addition, Lambda functions powered by Graviton2 are designed to deliver up to 19 percent better performance at 20 percent lower cost, leading to the additional benefit of improved environmental sustainability as a result of potential increased performance. We also recommend you review the delivery SLA to choose the appropriate patterns that reduce the consumption of resources when the resources are not needed. For example, moving to a batch ingestion pattern from real-time streaming patterns when real-time consumption is not required. Finally, it helps you to implement automation to terminate resources when not in use.

    Read the Sustainability whitepaper 

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Workshop

Optimize Data Pattern using Amazon Redshift Data Sharing

This workshop helps you optimize data patterns for sustainability, specifically focused on removing unneeded or redundant data, and minimizing data movement across networks.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?