This Guidance helps you extract, transform, and load (ETL) blockchain data into a column-oriented storage format that allows for easy access and expedited analysis. It consists of an open-source architecture for running cross-chain analytics on public blockchain data in addition to Bitcoin and Ethereum public datasets available through Open Data on AWS. This Guidance pulls data from the public Bitcoin and Ethereum blockchains and normalizes it into tabular data structures for blocks, transactions, and additional tables for data inside a block.
Architecture Diagram
Step 1
To consume the data for Ethereum and Bitcoin, use Amazon Managed Blockchain for Ethereum and self-hosted Bitcoin Core through Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic File System (Amazon EFS), Amazon DynamoDB, and Erigon Ethereum nodes.
Step 2
Deploy the Bitcoin feed and worker services through AWS Copilot on AWS Fargate and Amazon Elastic Container Service (Amazon ECS), and subscribe to the Bitcoin Core node to fetch historical and live data.
Step 3
Deploy the Ethereum feed and worker services through AWS Copilot on Fargate and Amazon ECS. Subscribe to the Managed Blockchain for Ethereum and Erigon Ethereum node to fetch historical and live data.
Step 4
Amazon Simple Storage Service (Amazon S3) stores data from the feeds as Parquet files. Amazon S3 ingests new data immediately after the creation of a new block.
Step 5
AWS Glue aggregates everyday and intraday Parquet files.
Step 6
With catalog data in AWS Glue Data Catalog, Amazon Athena and Amazon Redshift can query historical and live data.
Step 7
Amazon QuickSight visualizes data for business analysts.
Step 8
Researchers and data scientists use Amazon SageMaker to run cross chain analytics in Jupyter Notebooks.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
With Managed Blockchain, you can complete the deployment of Ethereum full node(s) to connect to public testnets and the Ethereum mainnet in a matter of minutes. This is in contrast to the slow deploy and sync times of self-hosted Ethereum nodes that can take 24-36 hours. We have built observability into the architecture with process-level metrics, logs, and dashboards. Extend these mechanisms to your needs, and create alarms in Amazon CloudWatch to inform your on-call team of any issues. Finally, you can automate the deployment of this Guidance with infrastructure as code frameworks such as AWS Cloud Development Kit (CDK) or AWS CloudFormation.
-
Security
This Guidance uses role-based access with AWS Identity and Access Management (IAM). The Amazon S3 bucket has encryption enabled, is private, and blocks public access. All roles are defined with least-privilege access, and all communications between services stay within the customer account. Administrators can control access to the Jupyter notebook, SageMaker, Amazon Redshift, Athena, and QuickSight through IAM roles.
-
Reliability
Various components in the architecture are deployed across multiple Availability Zones, such as the Managed Blockchain Ethereum nodes. By nature, all the serverless components, such as Fargate, are highly available and automatically scale to accommodate demand.
-
Performance Efficiency
This Guidance uses serverless technologies, which provide built-in fault tolerance and continuous scaling. Serverless services also allow for comparative testing against varying load levels and minimizes undifferentiated tasks like capacity provisioning and patching, so you can focus on business needs rather than server management. Further, you can enable auto scaling for AWS Glue, which will automatically remove workers from the cluster depending on the parallelism at each stage of the job run. Similarly, Amazon S3 automatically scales to meet high request rates. There are no limits to the number of prefixes in a bucket, and you can increase read or write performance through parallelization.
-
Cost Optimization
By using the AWS Glue serverless computing platform for ETL and Athena for serverless query, you pay only for the resources you use. To further optimize cost, you can use the Amazon S3 Intelligent-Tiering storage class, which automatically selects the ideal cost-effective storage tier for your content depending on its access patterns, such as frequency of access.
-
Sustainability
By using managed services such as Fargate and AWS Glue, we minimize the environmental impact of the backend services. Furthermore, public Ethereum blockchain shifted from the proof-of-work to the proof-of-stake consensus mechanism in late 2022, reducing Ethereum’s energy consumption by ~99.5 percent.*
*The Merge, Ethereum, April 19, 2023.
Implementation Resources
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
Access Bitcoin and Ethereum open datasets for cross-chain analytics
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.