AWS Solutions Library

AWS Solutions Library›
Guidance for Cross-Chain Analytics using Bitcoin and Ethereum Open Data on AWS

Guidance for Cross-Chain Analytics using Bitcoin and Ethereum Open Data on AWS

Go to sample code

Overview

This Guidance helps you extract, transform, and load (ETL) blockchain data into a column-oriented storage format that allows for easy access and expedited analysis. It consists of an open-source architecture for running cross-chain analytics on public blockchain data in addition to Bitcoin and Ethereum public datasets available through Open Data on AWS. This Guidance pulls data from the public Bitcoin and Ethereum blockchains and normalizes it into tabular data structures for blocks, transactions, and additional tables for data inside a block.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Download the architecture diagram

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

With Managed Blockchain, you can complete the deployment of Ethereum full node(s) to connect to public testnets and the Ethereum mainnet in a matter of minutes. This is in contrast to the slow deploy and sync times of self-hosted Ethereum nodes that can take 24-36 hours. We have built observability into the architecture with process-level metrics, logs, and dashboards. Extend these mechanisms to your needs, and create alarms in Amazon CloudWatch to inform your on-call team of any issues. Finally, you can automate the deployment of this Guidance with infrastructure as code frameworks such as AWS Cloud Development Kit (CDK) or AWS CloudFormation.

Read the Operational Excellence whitepaper

This Guidance uses role-based access with AWS Identity and Access Management (IAM). The Amazon S3 bucket has encryption enabled, is private, and blocks public access. All roles are defined with least-privilege access, and all communications between services stay within the customer account. Administrators can control access to the Jupyter notebook, SageMaker, Amazon Redshift, Athena, and QuickSight through IAM roles.

Read the Security whitepaper

Various components in the architecture are deployed across multiple Availability Zones, such as the Managed Blockchain Ethereum nodes. By nature, all the serverless components, such as Fargate, are highly available and automatically scale to accommodate demand.

Read the Reliability whitepaper

This Guidance uses serverless technologies, which provide built-in fault tolerance and continuous scaling. Serverless services also allow for comparative testing against varying load levels and minimizes undifferentiated tasks like capacity provisioning and patching, so you can focus on business needs rather than server management. Further, you can enable auto scaling for AWS Glue, which will automatically remove workers from the cluster depending on the parallelism at each stage of the job run. Similarly, Amazon S3 automatically scales to meet high request rates. There are no limits to the number of prefixes in a bucket, and you can increase read or write performance through parallelization.

Read the Performance Efficiency whitepaper

By using the AWS Glue serverless computing platform for ETL and Athena for serverless query, you pay only for the resources you use. To further optimize cost, you can use the Amazon S3 Intelligent-Tiering storage class, which automatically selects the ideal cost-effective storage tier for your content depending on its access patterns, such as frequency of access.

Read the Cost Optimization whitepaper

By using managed services such as Fargate and AWS Glue, we minimize the environmental impact of the backend services. Furthermore, public Ethereum blockchain shifted from the proof-of-work to the proof-of-stake consensus mechanism in late 2022, reducing Ethereum’s energy consumption by ~99.5 percent.*

*The Merge, Ethereum, April 19, 2023.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Guidance for Cross-Chain Analytics using Bitcoin and Ethereum Open Data on AWS

Overview

How it works

Well-Architected Pillars

Implementation Resources

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help

Guidance for Cross-Chain Analytics using Bitcoin and Ethereum Open Data on AWS

Overview

How it works

Well-Architected Pillars

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Implementation Resources

Related Content

Access Bitcoin and Ethereum open datasets for cross-chain analytics

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help