Announcements | AWS Big Data Blog

Deliver Apache Kafka data to streaming tables for Apache Iceberg with Amazon MSK Express brokers

Announcing delivery to streaming tables on Apache Iceberg for Amazon MSK Express brokers, a fully managed capability that continuously materializes your Kafka streaming data as queryable Iceberg tables on Amazon S3 Tables. No connectors, Flink jobs, or custom consumers to manage, and no code to write.

Amazon EMR Serverless now supports 32 vCPU workers for the most demanding Spark jobs

Accelerate Spark on EMR Serverless with larger workers and shuffle-optimized disks

Amazon EMR Serverless now supports a 32 vCPU / 244 GB worker configuration for the most demanding Spark jobs. Across 126 TPC-DS and TPC-H queries, larger workers delivered an average 29% faster query execution and 29% lower cost, with the biggest gains on shuffle-heavy, multi-table join queries.

Zero Copy access to Apache Iceberg tables in Amazon S3 from Salesforce Data 360 using the Iceberg REST endpoint from AWS Glue Data Catalog

In this post, we demonstrate how AWS and Salesforce customers can access their enterprise data lakes on AWS from Salesforce Data 360 using zero-copy file federation.

Introducing Apache Spark Connect support in AWS Glue interactive sessions

Apache Spark Connect bridges the gap between these two worlds: you develop in local Python, but execute on AWS Glue against actual data. Today, AWS Glue interactive sessions support Spark Connect natively. You can connect from any environment that supports the PySpark remote() API, including VS Code, PyCharm, Amazon SageMaker Unified Studio notebooks, and standalone Python applications. You don’t need to install specialized kernels or manage cluster infrastructure.

Amazon Redshift RG: Faster and lower cost, Graviton-powered

In this post, we describe the innovations that make RG instances so much faster. We also share benchmark results showing that RG delivers up to 4.2x better price-performance than other leading data warehouses.

Run log analytics for a fraction of the cost with the new engine for Amazon OpenSearch Service

We’re introducing a purpose-built log analytics engine for Amazon OpenSearch Service. This new engine delivers up to 4x price performance, 2x faster data ingestion, up to 2x faster analytical queries, and up to 70 percent lower storage costs. You get all of this without sacrificing search capabilities on the same data. In this post, you learn how to take advantage of these benefits, see how to get started, and review benchmark results at billion-document scale.

Scale analytics with Amazon Redshift multi-warehouse enhancements

In this post, we introduce new capabilities of Amazon Redshift that enhance our multi-warehouse and scaling capabilities: remote materialized view (MV) operations, remote table DDL support, and concurrency scaling enhancements for zero-ETL and S3 event integration. These features help you build more scalable, performant decentralized analytics architectures on Amazon Redshift.

Detecting fraud patterns across Snowflake and AWS using SageMaker Data Agent

Amazon SageMaker Data Agent launches three new capabilities in Amazon SageMaker Unified Studio notebooks: SQL analytics on Snowflake data sources, materialized view management, and interactive charting. Practitioners can use them together to query Snowflake alongside AWS data, pre-compute and schedule repeated aggregations, and create interactive visualizations from natural language prompts in a single notebook, without writing boilerplate code or switching tools. In this post, we describe the challenges these capabilities address, introduce each one, and walk through a fraud analytics scenario that demonstrates them working together in an end-to-end investigation workflow.

Introducing Private Networking for Amazon MQ for RabbitMQ

In this post, we explain how Private Networking for Amazon MQ for RabbitMQ works and walk through the setup process. Whether you’re securing a private identity provider, federating messages between brokers, or connecting to self-hosted RabbitMQ, your broker can now reach private destinations without exposing them publicly.

AI-assisted data development with Kiro and SageMaker Unified Studio

With the AWS Toolkit for Visual Studio Code, you can connect Kiro, VS Code, or Cursor directly to Amazon SageMaker Unified Studio. This post demonstrates the integration using Kiro. The same Remote Access connection works with VS Code and Cursor. The post starts by showing what you can do with this integration: using natural language to explore and analyze data in a governed environment. We then walk through the setup so you can try it yourself.

AWS Big Data Blog

Category: Announcements