AWS Database Blog
Category: Analytics
Build a streaming ETL pipeline on Amazon RDS using Amazon MSK
Customers who host their transactional database on Amazon Relational Database Service (Amazon RDS) often seek architecture guidance on building streaming extract, transform, load (ETL) pipelines to destination targets such as Amazon Redshift. This post outlines the architecture pattern for creating a streaming data pipeline using Amazon Managed Streaming for Apache Kafka (Amazon MSK). Amazon MSK offers a fully managed Apache Kafka service, enabling you to ingest and process streaming data in real time.
Modernize your legacy databases with AWS data lakes, Part 1: Migrate SQL Server using AWS DMS
This is a three-part series in which we discuss the end-to-end process of building a data lake from a legacy SQL Server database. In this post, we show you how to build data pipelines to replicate data from Microsoft SQL Server to a data lake in Amazon S3 using AWS DMS. You can extend the solution presented in this post to other database engines like PostgreSQL, MySQL, and Oracle.
Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift is generally available
In this post, we discuss the challenges with traditional data analytics mechanisms, our approach to solve them, and how you can use Amazon Aurora PostgreSQL-Compatible Edition zero-ETL integration with Amazon Redshift, which is generally available as of October 15th, 2024.
Vector search for Amazon DynamoDB with zero ETL for Amazon OpenSearch Service
As organizations increasingly rely on Amazon DynamoDB for their operational database needs, the demand for advanced data insights and enhanced search capabilities continues to grow. Leveraging the power of Amazon OpenSearch Service and Amazon Bedrock, you can now unlock generative artificial intelligence (AI) capabilities for your DynamoDB data. In this post, we show how you […]
How Prisma Cloud built Infinity Graph using Amazon Neptune and Amazon OpenSearch Service
Palo Alto Network’s Prisma Cloud is a leading cloud security platform protecting enterprise cloud adoption from code to cloud workflows. Palo Alto Networks chose Amazon Neptune Database and Amazon OpenSearch Service as the core services to power its Infinity Graph. In this post, we discuss the scale Palo Alto Networks requires from these core services and how we were able to design a solution to meet these needs. We focus on the Neptune design decisions and benefits, and explain how OpenSearch Service fits into the design without diving into implementation details.
Stream change data in a multicloud environment using AWS DMS, Amazon MSK, and Amazon Managed Service for Apache Flink
When workloads and their corresponding transactional databases are distributed across multiple cloud providers, it can create challenges in using the data in near real time for advanced analytics. In this post, we discuss architecture, approaches, and considerations for streaming data changes from the transactional databases deployed in other cloud providers to a streaming data solution deployed on AWS.
Analyze blockchain data with natural language using Amazon Bedrock
Data within public blockchain networks such as Bitcoin and Ethereum can be accessed by anyone. However, accessing and making sense of this information has traditionally been a complex and technical undertaking. Much of the data is encoded and stored as bytes, rather than in a human-readable format. In this post, we introduce a solution that demonstrates how you can chat with blockchain data using Amazon Bedrock and the AWS Public Blockchain datasets. We discuss Amazon Bedrock, review the solution architecture, provide example prompts, share interesting findings, and go over how you can extend the solution to integrate with different data sources.
Query RDF graphs using SPARQL and property graphs using Gremlin with the Amazon Athena Neptune connector
To query a Neptune database in Athena, you can use the Amazon Athena Neptune connector, an AWS Lambda function that connects to the Neptune cluster and queries the graph on behalf of Athena. In this post, we provide a step-by-step implementation guide to integrate the new version of the Athena Neptune connector and query a Neptune cluster using Gremlin and SPARQL queries.
How Infosys used Amazon Aurora zero-ETL integration with Amazon Redshift for near real-time analytics and insights
In this post, we talk about how Infosys redefined the ETL landscape for their product sales and freight management application using Aurora zero-ETL to Amazon Redshift. We also explain our experience with the old process and how the new zero-ETL integration helped us effortlessly move data into a Redshift cluster for analytics along with metrics to monitor the health of the integration.
Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service
In this post, we discuss a design for a highly searchable movie content graph database built on Amazon Neptune, a managed graph database service. We demonstrate how to build a list of relevant movies matching a user’s search criteria through the powerful combination of lexical, semantic, and graphical similarity methods using Neptune, Amazon OpenSearch Service, and Neptune Machine Learning. To match, we compare movies with similar text as well as similar vector embeddings. We use both sentence and graph neural network (GNN) models to build these embeddings.