AWS Database Blog

Use a DAO to govern LLM training data, Part 1: Retrieval Augmented Generation

Blockchain and generative AI are two technical fields that have received a lot of attention in the recent years. There is an emerging set of use cases that can benefit from these two technologies. In this four-part series, we build a solution that governs the training data ingestion process of an AI model, using a smart contract and serverless components. We guide you through the different steps to build the solution. In this post, we review the overall architecture of the solution, and set up a large language model (LLM) knowledge base.

Load vector embeddings up to 67x faster with pgvector and Amazon Aurora

pgvector is the open source PostgreSQL extension for vector similarity search that powers generative artificial intelligence (AI) applications using techniques such as semantic search and retrieval-augmented generation (RAG). Amazon Aurora PostgreSQL-Compatible Edition has supported pgvector 0.5.1 since 2023. Amazon Aurora now supports pgvector version 0.7.0, which adds parallelism to improve the performance of building Hierarchical Navigable Small Worlds […]

How Dafiti migrated its most critical database to Amazon Aurora MySQL with minimal downtime and improved operational efficiency

In the dynamic world of digital retail, performance, resilience, and availability are not only desirable qualities, they are essential. Recently, Dafiti, a leading fashion and lifestyle ecommerce conglomerate operating in Brazil, Argentina, Chile, and Colombia, undertook a significant transformation of its critical database infrastructure by migrating from self-managed MySQL Server 5.7 on Amazon EC2 to Amazon Aurora MySQL. This strategic move improved the resiliency and efficiency of its database operations. In this post, we show you why we chose Aurora MySQL-Compatible and how we migrated our critical database infrastructure.

Build a streaming ETL pipeline on Amazon RDS using Amazon MSK

Customers who host their transactional database on Amazon Relational Database Service (Amazon RDS) often seek architecture guidance on building streaming extract, transform, load (ETL) pipelines to destination targets such as Amazon Redshift. This post outlines the architecture pattern for creating a streaming data pipeline using Amazon Managed Streaming for Apache Kafka (Amazon MSK). Amazon MSK offers a fully managed Apache Kafka service, enabling you to ingest and process streaming data in real time.

Embed textual data in Amazon RDS for SQL Server using Amazon Bedrock

In Part 1 of this post, we covered how Retrieval Augmented Generation (RAG) can be used to enhance responses in generative AI applications by combining domain-specific information with a foundation model (FM). However, we stayed focused on the semantic search aspect of the solution, assuming that our vector store was already built and fully populated. In this post, we explore how to generate vector embeddings on Wikipedia data stored in a SQL Server database hosted on Amazon RDS. We also use Amazon Bedrock to invoke the appropriate FM APIs and an Amazon SageMaker Jupyter Notebook to help us orchestrate the overall process.

Modernize your legacy databases with AWS data lakes, Part 1: Migrate SQL Server using AWS DMS

This is a three-part series in which we discuss the end-to-end process of building a data lake from a legacy SQL Server database. In this post, we show you how to build data pipelines to replicate data from Microsoft SQL Server to a data lake in Amazon S3 using AWS DMS. You can extend the solution presented in this post to other database engines like PostgreSQL, MySQL, and Oracle.

Performance testing MySQL migration environments using query playback and traffic mirroring – Part 3

This is the third post in a series where we dive deep into performance testing of MySQL environments being migrated from on premises. In Part 1, we compared the query playback and traffic mirroring approaches at a high level. In Part 2, we showed how to set up and configure query playback. In this post, we show you how to set up and configure traffic mirroring.

Performance testing MySQL migration environments using query playback and traffic mirroring – Part 2

This is the second post in a series where we dive deep into performance testing MySQL environments being migrated from on premises. In Part 1, we compared the query playback and traffic mirroring approaches at a high level. In this post, we dive into the setup and configuration of query playback.

Performance testing MySQL migration environments using query playback and traffic mirroring – Part 1

In this series of posts, we dive deep into performance testing of MySQL environments being migrated from on-premises to AWS. In this post, we review two different approaches to testing migrated environments with traffic that is representative of real production traffic: capturing and replaying traffic using a playback application, and mirroring traffic as it comes in using a proxy. This means you’re validating your environment using realistic data access patterns.

Use HammerDB to run performance tests on Amazon RDS for Db2

To ensure that you properly size your Amazon RDS for Db2 instances and achieve comparable or better performance than your on-premises systems, you can use HammerDB. By using this tool, you can generate OLTP-type workloads using TPC-C tests, enabling you to compare performance between your on-premises Db2 and Amazon RDS for Db2 systems. This post guides you through running HammerDB tests on RDS for Db2. We provide a step-by-step process for creating an RDS for Db2 instance using an AWS CloudFormation template, setting up a Db2 client, and configuring HammerDB. You learn how to execute tests and interpret results to properly size your RDS for Db2 instances.