AWS Database Blog

Category: AWS Glue

Filter, transform, and load your DynamoDB table exports using AWS Glue

In this post, we show how you can load (import) an Amazon DynamoDB full or incremental table export into a second DynamoDB table with precise control over what gets loaded, at what write rate, and with the ability to observe the progress. This technique helps drive large-scale data migrations and synchronizations where you want maximum control.

Rate-limiting calls to Amazon DynamoDB using Python Boto3, Part 2: Distributed Coordination

Part 1 of this series showed how to rate-limit calls to Amazon DynamoDB by using Python Boto3 event hooks. In this post, I expand on the concept and show how to rate-limit calls in a distributed environment, where you want a maximum allowed rate across the full set of clients but can’t use direct client-to-client communication.

Accelerate SQL Server to Amazon Aurora migrations with a customizable solution

Migrating from SQL Server to Amazon Aurora can significantly reduce database licensing costs and modernize your data infrastructure. To accelerate your migration journey, we have developed a migration solution that offers ease and flexibility. You can use this migration accelerator to achieve fast data migration and minimum downtime while customizing it to meet your specific business requirements. In this post, we showcase the core features of the migration accelerator, demonstrated through a complex use case of consolidating 32 SQL Server databases into a single Amazon Aurora instance with near-zero downtime, while addressing technical debt through refactoring.

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse – Part 2

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse allows you to run analytics workloads on your DynamoDB data without having to set up and manage extract, transform, and load (ETL) pipelines. In this post we cover setting up Amazon SageMaker Unified Studio, followed by running data analysis to showcase its capabilities. We illustrate our solution walkthrough with an example of a credit card company that wants to analyze its customer behavior and spending trends.

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse – Part 1

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse allows you to run analytics workloads on your DynamoDB data without having to set up and manage extract, transform, and load (ETL) pipelines. In this two-part series, we first walk through the prerequisites and initial setup for the zero-ETL integration. In Part 2, we cover setting up Amazon SageMaker Unified Studio, followed by running data analysis to showcase its capabilities. We illustrate our solution walkthrough with an example of a credit card company that wants to analyze its customer behavior and spending trends.

Gather organization-wide Amazon RDS orphan snapshot insights using AWS Step Functions and Amazon QuickSight

In this post, we walk you through a solution to aggregate RDS orphan snapshots across accounts and AWS Regions, enabling automation and organization-wide visibility to optimize cloud spend based on data-driven insights. Cross-region copied snapshots, Aurora cluster copied snapshots and shared snapshots are out of scope for this solution. The solution uses AWS Step Functions orchestration together with AWS Lambda functions to generate orphan snapshot metadata across your organization. Generated metadata information is stored in Amazon Simple Storage Service (Amazon S3) and transformed into an Amazon Athena table by AWS Glue. Amazon QuickSight uses the Athena table to generate orphan snapshot insights.

Query RDF graphs using SPARQL and property graphs using Gremlin with the Amazon Athena Neptune connector

To query a Neptune database in Athena, you can use the Amazon Athena Neptune connector, an AWS Lambda function that connects to the Neptune cluster and queries the graph on behalf of Athena. In this post, we provide a step-by-step implementation guide to integrate the new version of the Athena Neptune connector and query a Neptune cluster using Gremlin and SPARQL queries.

Create an AWS Glue Data Catalog with AWS DMS

Businesses need near realtime access to the latest data and metadata available from many silos to perform analytics. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML) and application development. AWS Glue Data Catalog is a centralized […]