AWS Big Data Blog

Efficient log management with Amazon OpenSearch Service data streams

In this post, we show you how to implement data streams with Index State Management (ISM) in Amazon OpenSearch Service. This approach automatically manages your time series data lifecycle and optimizes both performance and costs. Data streams distribute incoming data across multiple backing indices, helping to reduce single-index bottlenecks, while ISM policies automate rollover, retention, and storage tiering to help manage costs.

Alight OpenSearch Service architecture showing cross-account log ingestion from Amazon ECS and Amazon EC2 workloads through OpenSearch Ingestion to Amazon OpenSearch Service

How Alight Solutions achieved 55% cost savings with Amazon OpenSearch Service

In this post, we share how Alight Solutions migrated from self-managed Elasticsearch to Amazon OpenSearch Service. The migration achieved a 55% cost reduction, alleviated approximately 2,000 hours per year of operational overhead, and gave Alight access to advanced observability features they could not prioritize before.

Govern Amazon Redshift data across accounts with SageMaker Unified Studio

Govern Amazon Redshift Data Warehouses Data Across Accounts using Amazon SageMaker Unified Studio

In this post, we show you how to use Amazon SageMaker Unified Studio to implement cross-account data sharing in Amazon Redshift using data mesh principles. We demonstrate how to build a scalable data mesh architecture that supports secure, auditable data sharing across AWS accounts while reducing operational burden.

Migrate from Apache Solr to Amazon OpenSearch Serverless

In this post, you will learn why now is the time to take advantage of the ease of operations and native AI capabilities of OpenSearch Serverless, and migrate from Solr.

High-performance Remote Shuffle Service on Amazon EMR with Apache Celeborn

In this post, we show how Apache Celeborn resolves this trade-off for Amazon EMR on EKS and Amazon EMR on EC2, improving job reliability while unlocking additional cost savings.

Zero Copy access to Apache Iceberg tables in Amazon S3 from Salesforce Data 360 using the Iceberg REST endpoint from AWS Glue Data Catalog

In this post, we demonstrate how AWS and Salesforce customers can access their enterprise data lakes on AWS from Salesforce Data 360 using zero-copy file federation.

Patch perfect: Automating Amazon Redshift patch testing

In this post, we demonstrate an automated test suite that validates your Amazon Redshift cluster automatically after any patch, reboot, or modification. It uses standard drivers against real workload patterns to provide a verified gate between a patch landing and that patch reaching production.

Multi-cloud lakehouse architecture on AWS for Agentic AI, Part 1: Architecture and best practices

This post focuses on explaining the architecture approach to build the open lakehouse architecture on AWS, unifying the metadata catalog across providers for the AI agents to access. In addition, it highlights the architecture trade-offs and best practices.

How Razorpay Built Real-Time Anomaly Detection with Amazon MSK

In this post, we explore Razorpay’s anomaly detection and alerting platform (ADA) architecture using Amazon Managed Streaming for Apache Kafka (Amazon MSK) and other AWS services. According to Razorpay the system detects transaction anomalies in under 30 seconds, supports thousands of merchant-level alerts, and reduced monitoring costs by approximately 80 percent. The platform maintains 99.99 percent uptime for over 500 million transactions per month.

Cut costs and simplify operations with writable warm storage in Amazon OpenSearch Service

In this post, I show you how writable warm storage removes the costly migration cycle. You can reduce your infrastructure costs by up to 48 percent and update historical data in seconds instead of hours. I walk through a real-world cost comparison and performance benchmarks, and help you decide when to use writable warm versus UltraWarm.