AWS Database Blog

Export and analyze Amazon QLDB journal data using AWS Glue and Amazon Athena

Amazon Quantum Ledger Database (Amazon QLDB) is a fully managed ledger database that maintains a complete, immutable record of every change committed to the database. As transactions are committed to the database, they are appended to a transaction log called a journal and are cryptographically hash-chained to the previous transaction. Once committed, the record of the transaction is permanent and the integrity of the transaction history can be verified with cryptography. This gives you a tamper-evident audit of change history for all documents in the ledger.

Amazon QLDB is optimized for writes and for targeted reads using indexes. It is not optimized for scan operations over large datasets, making it an ideal choice for transaction processing but not for reporting and analytics. In this post, we discuss how you can use the export feature in Amazon QLDB with AWS Glue and Amazon Athena to provide reporting and analytical capabilities to your ledger-based architectures. Optionally, you can use Amazon QuickSight with Athena to build interactive dashboards, paginated reports, embedded analytics, and natural language queries.

Solution overview

The following diagram depicts the architecture of the solution.

The solution uses several AWS services:

  • Amazon Athena – Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. In this solution, we use Athena to run queries against our transactional data exported from Amazon QLDB.
  • AWS Glue – AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. In this solution, AWS Glue extracts, transforms, and loads from the data source (Amazon S3) into target tables. We also use the following features:
    • AWS Glue Data Catalog – The Data Catalog contains references to data that are used as sources and targets of your extract, transform, and load (ETL) jobs in AWS Glue. To create your data warehouse or data lake, you must catalog this data. The Data Catalog is an index to the location, schema, and runtime metrics of your data. You use the information in the Data Catalog to create and monitor your ETL jobs.
    • AWS Glue crawler – You can use a crawler to populate the Data Catalog with tables. A crawler can crawl multiple data stores in a single run. Upon completion, the crawler creates or updates one or more tables in your Data Catalog. The ETL job reads from and writes to the data stores that are specified in the source and target Data Catalog tables.
  • Amazon S3 – Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. In this solution, Amazon QLDB exports the transactional data to S3 bucket. The data in this S3 bucket is our data source for AWS Glue.
    • Bucket – A bucket is a container for objects stored in Amazon S3. You can store any number of objects in a bucket and can have up to 100 buckets in your account. Every object is contained in a bucket.
  • Amazon QLDB – Amazon QLDB is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable change history log owned by a central trusted authority. You can use Amazon QLDB to track all application data changes, and maintain a complete and verifiable history of changes over time. For this post, this is the heart of the financial transactional data and single source of truth.
  • Amazon QuickSight (optional) – QuickSight allows everyone in your organization to understand your data by asking questions in natural language, exploring through interactive dashboards, or automatically looking for patterns and outliers powered by ML. You can connect QuickSight to Athena for building paginated reports, embedded analytics and natural language queries.
  • Other data sources (optional) – AWS Glue can connect to other file-based and table-based data stores as well.

Create resources with AWS CloudFormation

To get started, we create our solution resources using an AWS CloudFormation template.

  1. Launch the CloudFormation stack by choosing Launch Stack:
  2. Choose Create stack.The stack creation starts with the status Create in Progress and takes approximately 5 minutes to complete.
  3. Open the Outputs tab and take note of the following resource names:
    • Amazon QLDB ledger walletledger
    • S3 bucket walletledger-s3-export-bucket-xXXXXX

Create a table for your sample data in the Amazon QLDB ledger

To create a table for sample data, complete the following steps:

  1. On the Amazon QLDB console, in the navigation pane, choose Getting Started and scroll down to the Sample application data section.
  2. Under Manual option, choose Open query editor.
  3. In the PartiQL editor, under Choose a ledger, choose walletledger.

  4. In the query editor, enter the following statement, then choose Run:
    CREATE TABLE wallettable

You can refresh the tables section in the left pane to see the table.

Load the sample data in the ledger

Now that the table is created, we’re ready to load sample data. Complete the following steps to load the initial sample data:

  1. In a query editor window, enter the following statement, then choose Run:
    INSERT INTO wallettable
    
    << {
    'email' : 'arnav.desai@dbmail.com',
    'username' : 'arnavd',
    'PreviousBalance' : '0',
    'CurrentBalance' : '34234'
    },
    {
    'email' : 'carlos.salazar@ghmail.com',
    'username' : 'carloss',
    'PreviousBalance' : '0',
    'CurrentBalance' : '873746'
    },
    {
    'email' : 'diego.ramirez@kbmail.com',
    'username' : 'diegor',
    'PreviousBalance' : '0',
    'CurrentBalance' : '354698'
    },
    {
    'email' : 'jane.doe@jhmail.com',
    'username' : 'janed',
    'PreviousBalance' : '0',
    'CurrentBalance' : '84905'
    },
    {
    'email' : 'john.doe@kbmail.com',
    'username' : 'johndh',
    'PreviousBalance' : '0',
    'CurrentBalance' : '994758'
    },
    {
    'email' : 'jorge.souza@mwmail.com',
    'username' : 'jorges',
    'PreviousBalance' : '0',
    'CurrentBalance' : '87458'
    },
    {
    'email' : 'akua.mansa@mhemail.com',
    'username' : 'akuam',
    'PreviousBalance' : '0',
    'CurrentBalance' : '45215'
    },
    {
    'email' : 'martha.rivera@sdemail.com',
    'username' : 'marthar',
    'PreviousBalance' : '0',
    'CurrentBalance' : '254589'
    },
    {
    'email' : 'mary.major@ksemail.com',
    'username' : 'marym',
    'PreviousBalance' : '0',
    'CurrentBalance' : '65847'
    },
    {
    'email' : 'nikki.wolf@jbemail.com',
    'username' : 'nikkiw',
    'PreviousBalance' : '0',
    'CurrentBalance' : '12368'
    } >>
  2. Verify the sample data by running the following statement in a new query window and choosing walletledger in the left pane under Choose a ledger:
    select * from wallettable

Add sample data for advanced analysis

For advanced querying in Athena in later sections, we generate a financial statement of a user (akua.mansa@mhemail.com) wallet account. For that wallet statement to be meaningful, we need to add few more sample transactions updates for the user.

In three separate query editor windows (make sure that walletledger is chosen in the left pane), enter the following statements in sequence, then choose Run. These statements simulate transactions done by the user akua.mansa@mhemail.com with different timestamps in the wallet account.

Use the following code for the first query:

UPDATE wallettable
SET PreviousBalance = 45215, CurrentBalance = 55215
WHERE email = 'akua.mansa@mhemail.com'

Use the following code for the second query:

UPDATE wallettable
SET PreviousBalance = 55215, CurrentBalance = 54234
WHERE email = 'akua.mansa@mhemail.com'

Use the following code for the third query:

UPDATE wallettable
SET PreviousBalance = 54234, CurrentBalance = 69234
WHERE email = 'akua.mansa@mhemail.com'

These query statements update the sample user’s wallet with transaction data that we can use to query the financial transaction record using Athena.

Create an Amazon QLDB export job

To create your export job, complete the following steps:

  1. On the Amazon QLDB console, choose Export in the navigation pane.
  2. Choose Create export job.
  3. In the Journal blocks section, choose the walletledger ledger.
  4. Enter 1900/01/01 and 00:00:00 in the Start date and time (UTC) fields.
  5. The End date and time (UTC) field contains the current date and time. Leave this value as it is.
  6. In the Write journal blocks to S3 section, choose Browse S3.
  7. Choose the S3 bucket you created with AWS CloudFormation (walletledger-s3-export-bucket), then choose Choose.
  8. Change the Export Format field to JSON.
  9. In the Grant service access to S3 section, choose Create and use a new service role.
  10. Ensure that S3 bucket name is populated with the correct S3 bucket. (Take note of the service role name that is created.)
  11. Choose Create export job.

The export job shows as In progress, and takes several minutes to complete.

You can verify the export files by navigating to the Amazon S3 console. In the list of buckets, choose the bucket walletledger-s3-export-bucket. Navigate through the directories in the export bucket and explore the files exported by Amazon QLDB.

Set up an AWS Glue crawler

In this step, we set up the AWS Glue components required to make our Amazon QLDB data in Amazon S3 available for querying via Athena. Complete the following steps to set up our AWS Glue workflow:

  1. On the AWS Glue console, choose Crawlers in the left navigation pane.
  2. Choose Create crawler.
  3. Give the crawler a name, such as qldb_s3_json_crawler.
  4. Choose Next.
  5. Under Data source configuration, for Is your data already mapped to Glue tables, select Not yet.
  6. Under Data sources, choose Add a data source.

  7. Under Data source, select S3.
  8. Under Location of S3 data, browse the S3 bucket you created (walletledger-s3-export-bucket-) and choose it and traverse further.
  9. Choose the current year (for this post, 2022).

  10. Select the current month (for this post, 10), then choose Choose.

  11. Choose Add an S3 data source.

  12. Select the Amazon S3 data source, then choose Next.

  13. Under IAM role, choose Create new IAM role.
  14. Enter a name of the new AWS Identity and Access Management (IAM) role with the existing prefix, for example AWSGlueServiceRole-QLDB-GLUECRAWL-IAM.
  15. Choose Create.
  16. Choose Next.
  17. Under Target Database, choose Add database (the Add database window launches in a new browser tab).
  18. Enter walletledger_glue_qldb_export_db, then choose Create database.
  19. Switch back to the AWS Glue crawler configuration browser tab and refresh the Target database, then select the database you created (walletledger_glue_qldb_export_db).
  20. Under Table name prefix optional, enter qldb_db_table_
  21. Under Advanced options, select Update all new and existing partitions with metadata from the table
  22. Choose Next.
  23. Review the crawler properties on the next page and choose Create crawler.
  24. Select the newly created crawler on the next page and choose Run on the Options tab.

The crawler may take a few minutes to complete. The state will change from Ready, Running, Stopping, and back to Ready when crawler is complete.

Query with Athena

We use Athena to query data in Amazon S3 that was exported from Amazon QLDB as JSON. Because we’re querying the data stored in Amazon S3, our queries don’t impose any load on our Amazon QLDB ledger or interfere with ongoing wallet transactions. We can therefore use Amazon QLDB financial transaction data for analytics-style queries that Amazon QLDB isn’t optimized for.

  1. On the Athena console, under Query Editor in the navigation pane, choose the database you created (walletledger_glue_qldb_export_db). The Tables list is populated, and you will see different tables with prefix qldb_db_table_.
  2. Before proceeding to the next step, verify that your Query Results Location is configured:
    1. In the query editor, choose the Settings tab.
    2. Choose Manage.
    3. Choose Browse S3 under Query result location and encryption and select the bucket walletledger-s3-export-bucket-.
  3. Go back to the editor by choosing Editor.
  4. Choose the options menu (three dots) next to the table name and choose Preview table.

The query runs and return results from the AWS Glue Data Catalog database.

Sample financial transaction queries to run with Athena

Now that you have successfully tested the preview of the table created by the AWS Glue workflow job, your Data Catalog is ready for running advanced queries for analysis as needed. In this section, we provide sample queries that you can run.

Query all transaction statements run on our Amazon QLDB ledger

In your Athena query editor, choose the plus sign to open a new query window. Enter the following statement in the query window and choose Run. Make sure to provide the database name and table name from the data source. The following statement returns all transactions run on Amazon QLDB through our AWS Glue Data Catalog table:

SELECT unnest_statements.starttime,

unnest_statements.statement

FROM "walletledger_glue_qldb_export_db"."table_name", UNNEST(transactioninfo.statements) AS t(unnest_statements)

The following screenshot shows the results from our query.

Query all INSERT statements on the Amazon QLDB ledger

We now run the following query in the query editor, which returns all INSERT statements run on Amazon QLDB:

SELECT unnest_statements.starttime,
unnest_statements.statement
FROM "walletledger_glue_qldb_export_db"."table_name" , UNNEST(transactioninfo.statements) AS t(unnest_statements)
WHERE UPPER(unnest_statements.statement) LIKE 'INSERT%'

The following screenshot shows the results from our query.

Query raw data by creating a table view

In this next query, we create a view in Athena that represents the Amazon QLDB ledger table. The query pulls the data from the database exported table and creates a view in Athena. This allows for analysis on a table structure similar to an Amazon QLDB table.

Run the following statement in a new query window:

CREATE VIEW walletview AS
SELECT DISTINCT listed_revisions.data.email,
listed_revisions.data.username,
listed_revisions.data.PreviousBalance,
listed_revisions.data.CurrentBalance,
listed_revisions.metadata.version,
listed_revisions.metadata.txTime timestamp
FROM "walletledger_glue_qldb_export_db"."table_name"
CROSS JOIN UNNEST(revisions) AS t(listed_revisions)

When the query is complete, you can see the new view created.

Inspect the newly created view with a timestamp of the transactions

This query inspects the newly created view with the following statement:

SELECT * FROM "walletledger_glue_qldb_export_db"."walletview" limit 10;

You can see the results from the walletview.

Generate a wallet statement of financial transactions for a specific user

Now that our view is created, we can generate a financial transaction record for a specific user. In a new query window, run the following query:

SELECT * FROM "walletledger_glue_qldb_export_db"."walletview" WHERE email='akua.mansa@mhemail.com' order by version desc;

The following screenshot shows our results.

The results returned are the financial transactions revisions that happened on the user’s wallet account with different timestamps. The revisions query helps generate a financial statement for all deductions and additions of the funds from the wallet.

Analyze count of financial transactions

Financial services companies use Amazon QLDB to process transactions, authorize payments, and manage account balances. These companies need robust reporting and analytics capabilities. As a result, financial services customers need efficient data lineage and audit capabilities, and must prove the trustworthiness of their data and operations to regulators.

One of the requirements in analysis of financial transactions is to generate the number of transactional queries that have been run on the financial data. The reason this is important is to verify if the transaction completed. For example, payments, deductions, credits made through external applications. Senders or receivers often complain that the money never credited or debited in or from their accounts.

This query returns counts for all the record query types. In a new query window, run the following query:

SELECT UPPER(SPLIT_PART(unnest_statements.statement,' ',1)) AS "Statement type", count(*) AS "Number of Statements"
FROM "walletledger_glue_qldb_export_db"."table_name" , UNNEST(transactioninfo.statements) AS t(unnest_statements)
GROUP BY UPPER(SPLIT_PART(unnest_statements.statement,' ',1))

The following screenshot shows our results.

Clean up

When you’re done testing, delete the resources you created so that you’re no longer billed for them.

  1. Launch AWS CloudShell.
  2. Replace the bucket name in the following command with the bucket created by the CloudFormation stack (walletledger-s3-export-bucket-xXXXXX) and press Enter:
    $ aws s3 rb s3://walletledger-s3-export-bucket-xXXXXX --force

  3. On the AWS CloudFormation console, select the stack qldb-blog-resources and choose Delete.
  4. On the IAM console, choose Roles in the navigation pane.
  5. Select the roles you created as part of the export job (if you followed the instructions, you can enter qldb in the search box to filter on the roles that were created) and choose Delete.
  6. On the AWS Glue console, choose Crawlers in the navigation pane.
  7. Select the crawler you created (qldb_s3_json_crawler) and on the Action menu, choose Delete crawler.
  8. Choose Databases in the left navigation pane.
  9. Select the database that you created (walletledger_glue_qldb_export_db) and on the Action menu, choose Delete database.

Conclusion

In this post, we showed you how to export critical transaction data from banking or wallet accounts stored in Amazon QLDB to Amazon S3, trigger an ETL workflow with AWS Glue, and query the data in Amazon S3 using Athena. This process makes your Amazon QLDB ledger data available for query and analysis using AWS data lake technologies.

This combination of technologies enables you to keep a complete and verifiable history of all of your transactions within Amazon QLDB, while making that data available to the rest of the business for query and analysis in a serverless, cost-effective manner using AWS Glue and Athena.

Leave your feedback, comments, and suggestions for improvement in the comments section.


About the Author

Anurag Jain is a Global Solutions Architect at AWS based out of Palo Alto, CA. He has 2 decades of wide technology experience in Innovation & Prototyping solutions with core expertise in AWS Cloud platform & Architecting Microservices. He primarily drives Application Modernization journey on AWS Cloud, build Cloud Center of Excellence practice and serve as Advisory Consultant to Office of CTO for World Wide High Tech customers & System Integrator’s.