AWS Big Data Blog

Category: Announcements

Enhanced data discovery in Amazon SageMaker Catalog with custom metadata forms and rich text documentation

Amazon SageMaker Catalog now supports custom metadata forms and rich text descriptions at the column level, extending existing curation capabilities for business names, descriptions, and glossary term classifications. Column-level context is essential for understanding and trusting data. This release helps organizations improve data discoverability, collaboration, and governance by letting metadata stewards document columns using structured and formatted information that aligns with internal standards. In this post, we show how to enhance data discovery in SageMaker Catalog with custom metadata forms and rich text documentation at the schema level.

Introducing Amazon MWAA Serverless

Today, AWS announced Amazon Managed Workflows for Apache Airflow (MWAA) Serverless. This is a new deployment option for MWAA that eliminates the operational overhead of managing Apache Airflow environments while optimizing costs through serverless scaling. In this post, we demonstrate how to use MWAA Serverless to build and deploy scalable workflow automation solutions.

Amazon MSK Express brokers now support Intelligent Rebalancing for 180 times faster operation performance

Effective today, all new Amazon Managed Streaming for Apache Kafka (Amazon MSK) Provisioned clusters with Express brokers will support Intelligent Rebalancing at no additional cost. In this post we’ll introduce the Intelligent Rebalancing feature and show an example of how it works to improve operation performance.

Enhanced search with match highlights and explanations in Amazon SageMaker

Amazon SageMaker now enhances search results in Amazon SageMaker Unified Studio with additional context that improves transparency and interpretability. The capability introduces inline highlighting for matched terms and an explanation panel that details where and how each match occurred across metadata fields such as name, description, glossary, and schema. In this post, we demonstrate how to use enhanced search in Amazon SageMaker.

Amazon Kinesis Data Streams launches On-demand Advantage for instant throughput increases and streaming at scale

Today, AWS announced the new Amazon Kinesis Data Streams On-demand Advantage mode, which includes warm throughput capability and an updated pricing structure. With this feature you can enable instant scaling for traffic surges while optimizing costs for consistent streaming workloads. In this post, we explore this new feature, including key use cases, configuration options, pricing considerations, and best practices for optimal performance.

Amazon Kinesis Data Streams now supports 10x larger record sizes: Simplifying real-time data processing

Today, AWS announced that Amazon Kinesis Data Streams now supports record sizes up to 10MiB – a tenfold increase from the previous limit. In this post, we explore Amazon Kinesis Data Streams large record support, including key use cases, configuration of maximum record sizes, throttling considerations, and best practices for optimal performance.

Introducing Apache Airflow 3 on Amazon MWAA: New features and capabilities

AWS announced the general availability of Apache Airflow 3 on Amazon Managed Workflows for Apache Airflow (Amazon MWAA). This release transforms how organizations use Apache Airflow to orchestrate data pipelines and business processes in the cloud, bringing enhanced security, improved performance, and modern workflow orchestration capabilities to Amazon MWAA customers. This post explores the features of Airflow 3 on Amazon MWAA and outlines enhancements that improve your workflow orchestration capabilities

Announcing cross-account ingestion for Amazon OpenSearch Service

Amazon OpenSearch Ingestion is a powerful data ingestion pipeline that AWS customers use for many different purposes, such as observability, analytics, and zero-ETL search. Many customers today push logs, traces, and metrics from their applications to OpenSearch Ingestion to store and analyze this data. Today, we are happy to announce that OpenSearch Ingestion pipelines now […]

Tailor Amazon SageMaker Unified Studio project environments to your needs using custom blueprints

Amazon SageMaker Unified Studio is a single data and AI development environment that brings together data preparation, analytics, machine learning (ML), and generative AI development in one place. By unifying these workflows, it saves teams from managing multiple tools and makes it straightforward for data scientists, analysts, and developers to build, train, and deploy ML […]