Analytics | AWS Big Data Blog

Multi-cloud lakehouse architecture on AWS for Agentic AI, Part 1: Architecture and best practices

This post focuses on explaining the architecture approach to build the open lakehouse architecture on AWS, unifying the metadata catalog across providers for the AI agents to access. In addition, it highlights the architecture trade-offs and best practices.

How Razorpay Built Real-Time Anomaly Detection with Amazon MSK

In this post, we explore Razorpay’s anomaly detection and alerting platform (ADA) architecture using Amazon Managed Streaming for Apache Kafka (Amazon MSK) and other AWS services. According to Razorpay the system detects transaction anomalies in under 30 seconds, supports thousands of merchant-level alerts, and reduced monitoring costs by approximately 80 percent. The platform maintains 99.99 percent uptime for over 500 million transactions per month.

Cut costs and simplify operations with writable warm storage in Amazon OpenSearch Service

In this post, I show you how writable warm storage removes the costly migration cycle. You can reduce your infrastructure costs by up to 48 percent and update historical data in seconds instead of hours. I walk through a real-world cost comparison and performance benchmarks, and help you decide when to use writable warm versus UltraWarm.

Introducing Apache Spark Connect support in AWS Glue interactive sessions

Apache Spark Connect bridges the gap between these two worlds: you develop in local Python, but execute on AWS Glue against actual data. Today, AWS Glue interactive sessions support Spark Connect natively. You can connect from any environment that supports the PySpark remote() API, including VS Code, PyCharm, Amazon SageMaker Unified Studio notebooks, and standalone Python applications. You don’t need to install specialized kernels or manage cluster infrastructure.

How BigBasket uses the Iceberg based lakehouse architecture on AWS to power lightning-fast grocery delivery across India

In this post, we demonstrate how BigBasket implemented the lakehouse architecture on AWS, including their architecture decisions, implementation approach, and the measurable business results you can expect from a similar modernization. Whether you’re facing scalability challenges or planning your own lakehouse implementation, this blueprint provides actionable insights you can adapt for your organization.

Accelerating log analytics at scale with AWS Glue and Apache Iceberg materialized views

In this post, you learn how to build an application log pipeline for production use with Amazon CloudWatch Logs, AWS Lambda, Amazon Data Firehose, AWS Glue, and Apache Iceberg materialized tables. You then use materialized views to accelerate query performance. This solution helps you achieve faster query response times on large-scale log data without requiring you to manage continuous data lake refresh.

Serverless analytics pipelines using the Apache Spark engine in Amazon Athena

This post shows how developers, data engineers, and analysts can connect to a secure Spark Connect endpoint in Athena with Apache Spark. You can use your preferred tools, such as Jupyter notebooks, VS Code, or dbt with Apache Airflow, without managing cluster lifecycle or scaling.

Deploy modern data platforms in minutes with MDAA

In this post, we explore how MDAA transforms data architecture development from months of manual coding to production-ready deployment through configuration-driven infrastructure and embedded governance, examine a real customer transformation, and provide a clear implementation pathway for your own data modernization journey.

Amazon Redshift RG: Faster and lower cost, Graviton-powered

In this post, we describe the innovations that make RG instances so much faster. We also share benchmark results showing that RG delivers up to 4.2x better price-performance than other leading data warehouses.

Run log analytics for a fraction of the cost with the new engine for Amazon OpenSearch Service

We’re introducing a purpose-built log analytics engine for Amazon OpenSearch Service. This new engine delivers up to 4x price performance, 2x faster data ingestion, up to 2x faster analytical queries, and up to 70 percent lower storage costs. You get all of this without sacrificing search capabilities on the same data. In this post, you learn how to take advantage of these benefits, see how to get started, and review benchmark results at billion-document scale.

AWS Big Data Blog

Category: Analytics