AWS Big Data Blog
Category: Database
Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift
Our zero-ETL integration with Amazon Redshift facilitates point-to-point data movement to get it ready for analytics, artificial intelligence (AI) and machine learning (ML) using Amazon Redshift on petabytes of data. In this post, we provide step-by-step guidance on how to get started with near real time operational analytics using the Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift.
Unlock insights on Amazon RDS for MySQL data with zero-ETL integration to Amazon Redshift
Amazon Relational Database Service (Amazon RDS) for MySQL zero-ETL integration with Amazon Redshift was announced in preview at AWS re:Invent 2023 for Amazon RDS for MySQL version 8.0.28 or higher. In this post, we provide step-by-step guidance on how to get started with near real-time operational analytics using this feature. This post is a continuation […]
Announcing data filtering for Amazon Aurora MySQL zero-ETL integration with Amazon Redshift
AWS is now announcing data filtering on zero-ETL integrations, enabling you to bring in selective data from the database instance on zero-ETL integrations between Amazon Aurora MySQL and Amazon Redshift. This feature allows you to select individual databases and tables to be replicated to your Redshift data warehouse for analytics use cases. In this post, we provide an overview of use cases where you can use this feature, and provide step-by-step guidance on how to get started with near real time operational analytics using this feature.
Build a RAG data ingestion pipeline for large-scale ML workloads
For building any generative AI application, enriching the large language models (LLMs) with new data is imperative. This is where the Retrieval Augmented Generation (RAG) technique comes in. RAG is a machine learning (ML) architecture that uses external documents (like Wikipedia) to augment its knowledge and achieve state-of-the-art results on knowledge-intensive tasks. For ingesting these […]
Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service
In this post, we explore the process of integrating Amazon Keyspaces and Amazon OpenSearch Service using AWS Lambda and Amazon OpenSearch Ingestion to enable advanced search capabilities. The content includes a reference architecture, a step-by-step guide on infrastructure setup, sample code for implementing the solution within a use case, and an AWS Cloud Development Kit (AWS CDK) application for deployment.
Simplify data streaming ingestion for analytics using Amazon MSK and Amazon Redshift
Towards the end of 2022, AWS announced the general availability of real-time streaming ingestion to Amazon Redshift for Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK), eliminating the need to stage streaming data in Amazon Simple Storage Service (Amazon S3) before ingesting it into Amazon Redshift. Streaming ingestion from Amazon […]
Reference guide to analyze transactional data in near-real time on AWS
Business leaders and data analysts use near-real-time transaction data to understand buyer behavior to help evolve products. The primary challenge businesses face with near-real-time analytics is getting the data prepared for analytics in a timely manner, which can often take days. Companies commonly maintain entire teams to facilitate the flow of data from ingestion to […]
Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider
Many organizations use identity providers (IdPs) to authenticate users, manage their attributes, and group memberships for secure, efficient, and centralized identity management. You might be modernizing your data architecture using Amazon Redshift to enable access to your data lake and data in your data warehouse, and are looking for a centralized and scalable way to […]
How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift
Gupshup is a leading conversational messaging platform, powering over 10 billion messages per month. Across verticals, thousands of large and small businesses in emerging markets use Gupshup to build conversational experiences across marketing, sales, and support. Gupshup’s carrier-grade platform provides a single messaging API for 30+ channels, a rich conversational experience-building tool kit for any […]
Use multiple bookmark keys in AWS Glue JDBC jobs
AWS Glue is a serverless data integrating service that you can use to catalog data and prepare for analytics. With AWS Glue, you can discover your data, develop scripts to transform sources into targets, and schedule and run extract, transform, and load (ETL) jobs in a serverless environment. AWS Glue jobs are responsible for running […]