AWS Database Blog

Retrieve Bitcoin and Ethereum Public Blockchain Data with Amazon Managed Blockchain Query

Over the past two years, public blockchain adoption has been driven by three primary use cases:

  • decentralized finance (DeFi), which provides an open financial system built using smart contracts on public blockchains
  • non-fungible tokens (NFTs) that certify ownership of digital assets and enable broad transferability of those digital assets
  • digital currency payments that enable value transfer without intermediate institutions

In addition to the adoption driven by the early Web3 innovators, regulated enterprises are also increasing their use of public blockchains. For example, financial institutions such as Itaú Bank and Fidelity are entering into the cryptocurrency space as trusted, regulated custodians and exchanges. Other institutions, such as Nasdaq, are enabling cryptocurrency for traditional finance as technology providers. Media companies such as Warner Music Group and consumer companies such as Nike continue to incorporate NFTs into their event ticketing and loyalty programs. Payment processors such as Visa and Mastercard are offering cryptocurrency debit cards and solutions for merchants.

However, the growing adoption has also highlighted one of the growing challenges of building with blockchain, which is data. The immutability of blockchain transactions, a strength of the technology, means that the data can only grow with time. And although this data is stored publicly and can be accessed by everyone, transforming this information into a performant and accurate format that can be accessed efficiently at scale by applications is a challenge for many customers.

In this post, we describe the general challenges of working with blockchain data and introduce how Amazon Managed Blockchain Query, a feature of Amazon Managed Blockchain, can help you access blockchain data quickly so you can focus on building differentiators for the mainstream businesses and consumers.

Challenges of blockchain data

At a high level, the challenges of preparing blockchain data is similar to many other big data problems: complicated ETL (extract, transform, and load) processes, disparate sources of information, and expensive storage.

First, the method in which blockchain data is stored poses challenges for application use. Blockchain data is essentially stored as a ledger of state transformations between each block. That is, the information captures what changed from block n to block n+1, such as balance changes in addresses or new smart contracts being deployed. This structure, although well-suited for ensuring data integrity and security on a decentralized network, isn’t ideally suited for tasks like data analysis or developing applications that need to run complex queries. There are no simple data views that can serve commonly requested data in a performant manner that application builders are used to. Take for example a wallet developer, who needs to load the historical wallet balance of a user in seconds. To run the queries efficiently, they would first need to download the entire blockchain ledger (which could be several hundreds of gigabytes in size and requires connecting to blockchain nodes) from multiple blockchains such as Ethereum, Arbitrum, and Polygon. Following this, the data would need to be indexed in a database system that allows efficient queries—a task that presents a significant computational challenge, especially when dealing with larger blockchains.

Second, in order to provide real-time data, developers have to adapt their system for blockchain finality, which is the assurance that that past transactions can never change. Because blockchains are global and decentralized, it takes a certain time (or number of blocks) for the network to reach an agreement, or consensus, about the blockchain’s legitimate state. During this time, which could range from minutes to hours, data received by a blockchain node may be replaced by another dataset that is recognized by the network as legitimate. In busy times with many transactions being submitted to different nodes, blockchains may revert transactions from multiple blocks. This characteristic of eventual finality is challenging for developers because in such cases, they have to correct the data they have already processed and calculated (such as balances), while still keeping up to date with the streaming information from the blockchain. This also causes problems in situations demanding real-time data for decision-making processes. Take for example a financial application interacting with public blockchains. A high-value trade could potentially be run based on a confirmed transaction in the system. However, this transaction could be later reversed if the network acknowledges another chain as the true one. The financial implications of such a reversal could be significant, demonstrating the inherent risk and complexity associated with streaming real-time data from the blockchain.

Third, developers must assimilate data from multiple sources to construct a complete picture of transactions and interactions in a manner that is easy for humans to understand. With applications now expected to be deployed across multiple blockchains, developers have to process information from different blockchains with potentially different standards. Data stored on the blockchain includes critical information such as the from address, to address, transaction amount, and address of the token smart contract. Blockchains such as Bitcoin and Ethereum both store data in completely different structures, and as such, developers who wish to access data from these blockchains must navigate these differences in data structure and JSON-RPC API options to extract that data in a usable manner.

Fourth, developers sometimes have to rerun smart contracts to understand the fine-grained details of their operations. This is one of the most resource-intensive challenges when working with blockchain data. Smart contracts are self-running contracts with the terms of the agreement directly written into lines of code. However, the current flexibility in how different smart contract developers code their functions and emit events on the blockchain makes tracking vital information such as balances or transaction details an arduous task. To interact with the distributed applications on the blockchain, such as decentralized lending platforms or decentralized exchanges, transactions often involve calling multiple smart contracts. Smart contracts often interact with each other and their behavior can vary based on the blockchain’s state. Consequently, obtaining an accurate understanding of what a smart contract was doing at a specific point in time often entails recreating the entire state of the blockchain at that moment and tracing the transactions. For example, consider a music artist offers a token-gated event where only holders of that artist’s Non-Fungible Token (NFT) collection on a given date would be admitted to a live concert. To verify the exact conditions under which an individual may be granted access to the event —such as the token balance of a given user’s wallet at a historical point in time or the exact NFT that user owned—a developer might have to recreate the entire state of the blockchain at a given time to determine who should and should not be granted access to the event, depending on smart contract design. This process is computationally expensive and time-consuming, presenting considerable challenges in terms of efficiency and scalability.

AMB Query – Key Features

In this section, we discuss key features of Managed Blockchain Query and how it supports builders’ blockchain data needs.

Cost-efficient public blockchain data

Managed Blockchain Query provides standardized APIs to retrieve blockchain data from multiple public blockchains, starting with Bitcoin and Ethereum, without requiring you to provision or manage infrastructure for blockchain nodes nor maintain indexing infrastructure to extract, transform, and load blockchain data for use in your applications. Managed Blockchain Query APIs offer predictable pricing by allowing you to pay only for the API requests you make, paying per million requests based on pricing buckets that are priced according to the compute and data resources required to render the result for the API request.

To illustrate the cost-efficiency of Managed Blockchain Query, imagine you operate a custodial digital asset wallet that interacts with the Bitcoin and Ethereum blockchains. Imagine the monthly demand for this wallet requires 20 million requests for token balances for user addresses on Bitcoin and .5 million requests to get transaction events (for example, transaction inputs and outputs) for Bitcoin. Additionally, the wallet must accommodate 5 million requests for token balances for Ethereum assets and 2.5 million requests for Ethereum transaction events. The cost for this request traffic is calculated in the following table (in theus-east-1Region).

Pricing chart illustrating 20 million GetTokenBalance requests on Bitcoin for a total of $140, half a million ListTransactionEvents requests on Bitcoin for $4.50, five million GetTOkenBalance requests on Ethereum for $35 and two and a half million ListTransactionEvents requests for Ethereum for 22.50. The total monthly estimate for this example is $202.

Reliable and highly available

With REST APIs available in the supported Regions backed by robust data indexing infrastructure and ETL pipelines, Managed Blockchain Query provides reliable and highly available access to public blockchain data across multiple chains. Managed Blockchain Query follows the AWS high standard for reliability, so you can build your application to use Managed Blockchain Query with confidence. Furthermore, you no longer have to guess about capacity, because Managed Blockchain Query APIs scale with your request volume to service your requests at subsecond latency. Managed Blockchain Query also provides high availability, and you can expect 99.9% availability from the service.

Regardless of whether you are building a proof of concept or a high-scale production workload, Managed Blockchain Query can support your blockchain data query needs.

Accelerate development of Web3 applications with developer-friendly REST APIs

Managed Blockchain Query offers APIs that reduce the time and complexity involved in rendering public blockchain data insights to your applications and your users. For example, with a single API call, you can retrieve a list of transactions sent and received from a given wallet address on Ethereum, which makes it easy to populate a user interface with historical transaction details. Furthermore, you can use Managed Blockchain Query to get the current finalized and historical balance for native coins (such as ETH and BTC) and non-native tokens (such as ERC20) with subsecond latency. You can use this API to retrieve individual tokens as well as to retrieve a full list of token balances for a given wallet address. Managed Blockchain Query APIs make common blockchain data insights readily available to developers to use within a myriad of applications, such as web and mobile wallets, analytics pipelines, trading applications, and more, helping developers with dependencies on blockchain data launch products faster.

Use cases for Managed Blockchain Query

Managed Blockchain Query delivers common public blockchain data in a simple way to developers, without requiring custom data indexing and query infrastructure. For example, imagine a digital wallet that needs to display a list of historical transactions and current balances for Ethereum tokens (ERC20, ERC721, ERC1155 and native Ether (ETH). With Managed Blockchain Query APIs, this data is available with a simple REST API call. You can learn more about each API in the Managed Blockchain Query collection in Managed Blockchain Query Developer Guide.

In this section, we discuss a variety of lower-level use cases that Managed Blockchain Query APIs can address.

Query current and historical token balances on public blockchains

The GetTokenBalance API provides a way to get the balance of native coins (ETH, BTC) and various tokens (ERC20, ERC721, ERC1155), which can be used to get the current balance of an externally owned account (EOA) or a historical balance using a universal timestamp (Unix timestamp, in seconds). For example, you can use the GetTokenBalance API to get an address’s balance of an ERC20 token, USDC, on the Ethereum mainnet. You can also retrieve balances of tokens in a batch by providing a list of coins or tokens for which to get balances using the BatchGetTokenBalance API.

Retrieve historical transaction data for a given address

With Managed Blockchain Query, you can easily retrieve historical data from public blockchains such as Ethereum. This enables several use cases, like populating transaction history on a mobile or desktop crypto wallet or providing contextual information about a transaction based on its transaction hash. In this example, you can use the ListTransactions and GetTransaction APIs to first get a list of all transactions for a given externally owned address (EOA) on the Ethereum mainnet, and then retrieve transaction details for a single transaction in the list.

Get all token balances for a given address

The ListTokenBalances API is a powerful tool for wallets, user interfaces, Web3 utilities, and more, because it returns a list of all balances for an address across both tokens (ERC20, ERC721, ERC1155) and native coins (ETH, BTC) on a given public blockchain in a single API call. In this example, you can provide an EOA and a network on the Ethereum mainnet, and receive a list of token and native coin balances on Ethereum in the response.

Get all tokens minted by a smart contract

The ListTokenBalances API can also return a list of all tokens (ERC20, ERC721, ERC1155) minted by a smart contract when passed the smart contract address as input. For example, you can retrieve information related to NFTs minted by an ERC721 smart contract on Ethereum in a single API call.

List events emitted by a given transaction

The ListTransactionEvents API allows you to retrieve a list of smart contract standard events that are emitted as a result of a given transaction, identified by its hash (transaction identifier). For example, you can use ListTransactionEvents to inspect the resulting events of a transaction that calls a function of an ERC20 token smart contract on Ethereum, such as a Transfer event.

Using AMB Query in the AWS Management Console

The AWS Management Console experience for Amazon Managed Blockchain Query offers a graphical user interface query editor that allows you to test Query APIs straight from your browser via your AWS account. To illustrate how to use the query editor, this example details how to use the ListTransactions API to retrieve a full list of transactions an EOA has been party to.

  1. Navigate to the AWS Management Console and sign into your AWS account. Then navigate to the Amazon Managed Blockchain service using the search bar and select the “Query public blockchains” radio button on the right-hand side of the screen. Finally, select “Launch AMB Query”.
  2. Once on the query editor page, you must select between the Bitcoin network or Ethereum mainnet as the target for your query. For this example, select “ETHERUM_MAINNET” from the “Blockchain network” dropdown.
  3. Then, select the AMB Query API you wish to run from the “Query type” dropdown, which in this case will be “ListTokenBalances”.
  4. Then, fill in both required and/or optional query parameters to finalize your request for blockchain data. In this example, provide an Ethereum externally owned account (EOA) address and any filters by date/time that are relevant.
  5. After adding all necessary details to your request, select “Run query” to receive results for your request. If no results are returned, verify that you are using an EOA that has transactions occurring within any date/time filters you have set for your request.

You may use the AMB Query editor in the AWS Management Console to test all supported AMB Query APIs and explore the Bitcoin and Ethereum data these APIs offer. AMB Query is a pay-as-you-go service billed based on your usage of APIs, thus no resources have been provisioned in your account as part of this hands-on example that need to be deleted.

Conclusion

In this post, we addressed the common challenges builders face when working with blockchain data, as well as how Managed Blockchain Query can address many of those challenges. If you would like to dive deeper into the technology behind Managed Blockchain Query and get hands-on with the service programmatically, check out our Managed Blockchain Query Developer Guide, which provides a series of code samples and an architecture example of how to use the service in the context of a digital wallet use case. Stay tuned for a follow-up blog post delving into technical concepts related to Managed Blockchain Query and how to integrate it with other AWS services.


About the authors

Forrest Colyer manages the Web3/Blockchain Specialist Solutions Architecture team that supports the Amazon Managed Blockchain (AMB) service. Forrest and his team support customers at every stage of their adoption journey, from proof of concept to production, providing deep technical expertise and strategic guidance to help bring blockchain workloads to life. Through his experience with private blockchain solutions led by consortia and public blockchain use cases like NFTs and DeFi, Forrest helps enable customers to identify and implement high-impact blockchain solutions.

John Liu is the Head of Product for Web3 / Blockchain at AWS. He has 13 years of experience as a product executive and 10 years of experience as a portfolio manager. Prior to AWS, John spent 4 years leading product and business development at public blockchain protocols with a heavy focus on cross-chain technology, DeFi, and NFTs. Prior to that, John gained financial expertise as Chief Product Officer for fintech companies and portfolio manager at various hedge funds.