AWS Big Data Blog

Accelerating log analytics at scale with AWS Glue and Apache Iceberg materialized views

Accelerating log analytics at scale with AWS Glue and Apache Iceberg materialized views

In this post, you learn how to build an application log pipeline for production use with Amazon CloudWatch Logs, AWS Lambda, Amazon Data Firehose, AWS Glue, and Apache Iceberg materialized tables. You then use materialized views to accelerate query performance. This solution helps you achieve faster query response times on large-scale log data without requiring you to manage continuous data lake refresh.

Serverless analytics pipelines using the Apache Spark engine in Amazon Athena

Serverless analytics pipelines using the Apache Spark engine in Amazon Athena

This post shows how developers, data engineers, and analysts can connect to a secure Spark Connect endpoint in Athena with Apache Spark. You can use your preferred tools, such as Jupyter notebooks, VS Code, or dbt with Apache Airflow, without managing cluster lifecycle or scaling.

Deploy modern data platforms in minutes with MDAA

In this post, we explore how MDAA transforms data architecture development from months of manual coding to production-ready deployment through configuration-driven infrastructure and embedded governance, examine a real customer transformation, and provide a clear implementation pathway for your own data modernization journey.

Run log analytics for a fraction of the cost with the new engine for Amazon OpenSearch Service

We’re introducing a purpose-built log analytics engine for Amazon OpenSearch Service. This new engine delivers up to 4x price performance, 2x faster data ingestion, up to 2x faster analytical queries, and up to 70 percent lower storage costs. You get all of this without sacrificing search capabilities on the same data. In this post, you learn how to take advantage of these benefits, see how to get started, and review benchmark results at billion-document scale.

Scale analytics with Amazon Redshift multi-warehouse enhancements

In this post, we introduce new capabilities of Amazon Redshift that enhance our multi-warehouse and scaling capabilities: remote materialized view (MV) operations, remote table DDL support, and concurrency scaling enhancements for zero-ETL and S3 event integration. These features help you build more scalable, performant decentralized analytics architectures on Amazon Redshift.

Amazon Redshift delivers faster performance for BI dashboards and real-time analytics

Today, we’re excited to announce a new performance optimization in Amazon Redshift that improves the response times of low-latency SQL queries, such as those used in real-time analytics applications or generated by BI dashboards. With this enhancement, you can experience improved query latencies because of a reduction in the time Amazon Redshift spends preparing SQL queries for execution. SQL queries start faster, so they return results quicker.

Optimize your Tableau integration with Amazon Redshift Serverless

In this post, we provide a guide to help you use Tableau’s Relationships and Amazon Redshift Serverless architecture to deliver sub-second insights while maximizing every Redshift Processing Unit (RPU). We also provide guidance on five key areas: data model architecture for optimal query performance, security configuration and access control, performance optimization through smart configuration, cost management strategies, and query and join optimization techniques.