This Guidance helps you extract, transform, and load (ETL) blockchain data into a column-oriented storage format that allows for easy access and expedited analysis. It consists of an open-source architecture for running cross-chain analytics on public blockchain data in addition to Bitcoin and Ethereum public datasets available through Open Data on AWS. This Guidance pulls data from the public Bitcoin and Ethereum blockchains and normalizes it into tabular data structures for blocks, transactions, and additional tables for data inside a block.

Architecture Diagram

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • With Managed Blockchain, you can complete the deployment of Ethereum full node(s) to connect to public testnets and the Ethereum mainnet in a matter of minutes. This is in contrast to the slow deploy and sync times of self-hosted Ethereum nodes that can take 24-36 hours. We have built observability into the architecture with process-level metrics, logs, and dashboards. Extend these mechanisms to your needs, and create alarms in Amazon CloudWatch to inform your on-call team of any issues. Finally, you can automate the deployment of this Guidance with infrastructure as code frameworks such as AWS Cloud Development Kit (CDK) or AWS CloudFormation.

    Read the Operational Excellence whitepaper 
  • This Guidance uses role-based access with AWS Identity and Access Management (IAM). The Amazon S3 bucket has encryption enabled, is private, and blocks public access. All roles are defined with least-privilege access, and all communications between services stay within the customer account. Administrators can control access to the Jupyter notebook, SageMaker, Amazon Redshift, Athena, and QuickSight through IAM roles.

    Read the Security whitepaper 
  • Various components in the architecture are deployed across multiple Availability Zones, such as the Managed Blockchain Ethereum nodes. By nature, all the serverless components, such as Fargate, are highly available and automatically scale to accommodate demand.

    Read the Reliability whitepaper 
  • This Guidance uses serverless technologies, which provide built-in fault tolerance and continuous scaling. Serverless services also allow for comparative testing against varying load levels and minimizes undifferentiated tasks like capacity provisioning and patching, so you can focus on business needs rather than server management. Further, you can enable auto scaling for AWS Glue, which will automatically remove workers from the cluster depending on the parallelism at each stage of the job run. Similarly, Amazon S3 automatically scales to meet high request rates. There are no limits to the number of prefixes in a bucket, and you can increase read or write performance through parallelization.

    Read the Performance Efficiency whitepaper 
  • By using the AWS Glue serverless computing platform for ETL and Athena for serverless query, you pay only for the resources you use. To further optimize cost, you can use the Amazon S3 Intelligent-Tiering storage class, which automatically selects the ideal cost-effective storage tier for your content depending on its access patterns, such as frequency of access.

    Read the Cost Optimization whitepaper 
  • By using managed services such as Fargate and AWS Glue, we minimize the environmental impact of the backend services. Furthermore, public Ethereum blockchain shifted from the proof-of-work to the proof-of-stake consensus mechanism in late 2022, reducing Ethereum’s energy consumption by ~99.5 percent.*

    *The Merge, Ethereum, April 19, 2023. 

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin. 

AWS Architecture

Access Bitcoin and Ethereum open datasets for cross-chain analytics

This post shares an open-source solution for running cross-chain analytics on public blockchain data along with public datasets for Bitcoin and Ethereum available through AWS Open Data.


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?