AWS Glue | AWS Database Blog

Efficiently compare items across two Amazon DynamoDB tables

In this post, we show an algorithm to efficiently compare two Amazon DynamoDB tables and find the differences between their items. We provide an example where two tables, each containing approximately half a billion items, are compared in less than 7 minutes, for less than $10.

Rate-limiting calls to Amazon DynamoDB using Python Boto3, Part 2: Distributed Coordination

Part 1 of this series showed how to rate-limit calls to Amazon DynamoDB by using Python Boto3 event hooks. In this post, I expand on the concept and show how to rate-limit calls in a distributed environment, where you want a maximum allowed rate across the full set of clients but can’t use direct client-to-client communication.

Accelerate SQL Server to Amazon Aurora migrations with a customizable solution

Migrating from SQL Server to Amazon Aurora can significantly reduce database licensing costs and modernize your data infrastructure. To accelerate your migration journey, we have developed a migration solution that offers ease and flexibility. You can use this migration accelerator to achieve fast data migration and minimum downtime while customizing it to meet your specific business requirements. In this post, we showcase the core features of the migration accelerator, demonstrated through a complex use case of consolidating 32 SQL Server databases into a single Amazon Aurora instance with near-zero downtime, while addressing technical debt through refactoring.

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse – Part 2

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse allows you to run analytics workloads on your DynamoDB data without having to set up and manage extract, transform, and load (ETL) pipelines. In this post we cover setting up Amazon SageMaker Unified Studio, followed by running data analysis to showcase its capabilities. We illustrate our solution walkthrough with an example of a credit card company that wants to analyze its customer behavior and spending trends.

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse – Part 1

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse allows you to run analytics workloads on your DynamoDB data without having to set up and manage extract, transform, and load (ETL) pipelines. In this two-part series, we first walk through the prerequisites and initial setup for the zero-ETL integration. In Part 2, we cover setting up Amazon SageMaker Unified Studio, followed by running data analysis to showcase its capabilities. We illustrate our solution walkthrough with an example of a credit card company that wants to analyze its customer behavior and spending trends.

Gather organization-wide Amazon RDS orphan snapshot insights using AWS Step Functions and Amazon QuickSight

In this post, we walk you through a solution to aggregate RDS orphan snapshots across accounts and AWS Regions, enabling automation and organization-wide visibility to optimize cloud spend based on data-driven insights. Cross-region copied snapshots, Aurora cluster copied snapshots and shared snapshots are out of scope for this solution. The solution uses AWS Step Functions orchestration together with AWS Lambda functions to generate orphan snapshot metadata across your organization. Generated metadata information is stored in Amazon Simple Storage Service (Amazon S3) and transformed into an Amazon Athena table by AWS Glue. Amazon QuickSight uses the Athena table to generate orphan snapshot insights.

Query RDF graphs using SPARQL and property graphs using Gremlin with the Amazon Athena Neptune connector

To query a Neptune database in Athena, you can use the Amazon Athena Neptune connector, an AWS Lambda function that connects to the Neptune cluster and queries the graph on behalf of Athena. In this post, we provide a step-by-step implementation guide to integrate the new version of the Athena Neptune connector and query a Neptune cluster using Gremlin and SPARQL queries.

Create an AWS Glue Data Catalog with AWS DMS

Businesses need near realtime access to the latest data and metadata available from many silos to perform analytics. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML) and application development. AWS Glue Data Catalog is a centralized […]

Visualize Ethereum ERC20 token data using Amazon Managed Blockchain Query and Amazon QuickSight

Businesses such as Paxos that issue stablecoin USD tokens want to find a way to identify common token metrics such as top holders, daily active users, daily volume, total number of holders, latest transfers, top Decentralized Finance (DeFi) protocols the tokens have been used on, and more. With Amazon Managed Blockchain (AMB) Query and Amazon […]

Archival solutions for Oracle database workloads in AWS: Part 1

This is a two-part series. In this post, we explain three archival solutions that allow you to archive Oracle data into Amazon Simple Storage Service (Amazon S3). In Part 2 of this series, we explain three archival solutions using native Oracle products and utilities. All of these options allow you to join current Oracle data with archived data.

AWS Database Blog

Category: AWS Glue