This Guidance, with the sample code, can be used to deploy a carbon data lake to the AWS Cloud using an AWS Cloud Development Kit (AWS CDK). It provides customers and partners with the foundational infrastructure that can be extended to support use cases including monitoring, tracking, reporting, and impact verification of greenhouse gas emissions. The carbon data lake Guidance sample code deploys a data lake and processing pipeline that assists with data ingestion, aggregation, automated processing, and CO2 equivalent calculation based on ingested greenhouse gas emissions data.

Please note: This solution by itself will not make a customer compliant with any end-to-end carbon accounting solution. It provides the foundational infrastructure from which additional complementary solutions can be integrated. 

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • If any changes are required for the Guidance, they can be implemented and deployed using GitHub Issues. All new deployments are tested through unit, security, infrastructure, and deployment testing. 

    Feedback can be visualized through the Step Functions workflow visuals. If data is not processed through the pipeline, the Step Functions workflow will visually depict at which point in the process the data processing failed. There are also Amazon SNS notifications that are sent when a pipeline fails for any reason. By using the Step Functions and Amazon SNS notifications, users can isolate the tech stack that caused a problem and evaluate the data submitted to the pipeline about the tech stack identified.

    Read the Operational Excellence whitepaper 
  • This Guidance applies a zero-trust model for authentication and authorization. All users to the web application are authenticated using Amazon Cognito user pools. All additional resources are granted least-privilege access and all access patterns are evaluated using the cdk-nag utility to check AWS Cloud Development Kit (AWS CDK) applications. 

     All data is encrypted at rest and in transit using AWS Key Management Service (AWS KMS), Amazon S3, Lambda, AWS Glue DataBrew, and DynamoDB.

    Read the Security whitepaper 
  • The services in this Guidance are highly available by default through AWS Managed Services (AMS). By enabling the provided sample code, all Amazon S3 bucket access is logged by default. Managed services such as Lambda and Step Functions emit Amazon CloudWatch metrics, and appropriate alarms can be configured to notify users about threshold breaches.

    All deployment and configuration changes are managed using AWS CDK, reducing the possibility of human error.

    Read the Reliability whitepaper 
  • The README file contains specific directions to extend, modify, or add to the Guidance. AWS customers or partners can extend the Guidance by adding additional ingestion APIs such as: building custom emissions factor libraries, doing custom calculations, and creating custom visualizations, forecasting, or AI/ML tools. 

    To decrease latency and improve performance, this Guidance is designed for deployment in any major AWS Region using AWS CDK regional context.

    Read the Performance Efficiency whitepaper 
  • The services in this guidance are managed by AWS and are serverless. They were selected to meet the demand with only the minimum resources required. We evaluated and tested with simulated synthetic data sources, selecting services that optimize performance while reducing cost and carbon footprint. 
    Read the Cost Optimization whitepaper 
  • By using an on-demand serverless architecture and Step Functions, this Guidance can continually scale to match the load with only the minimum resources. All data processing is compressed and each layer of the architecture deploys to a single Region by default.

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin. 

[Content Type]


This [blog post/e-book/Guidance/sample code] demonstrates how [insert short description].


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?