AWS Big Data Blog
Category: Announcements
Introducing Cluster insights: Unified monitoring dashboard for Amazon OpenSearch Service clusters
This blog will guide you through setting up and using Cluster Insights, including key features and metrics. By the conclusion, you’ll understand how to use Cluster insights to recognize and address performance and resiliency issues within your OpenSearch Service clusters.
Enforce business glossary classification rules in Amazon SageMaker Catalog
Amazon SageMaker Catalog now supports metadata enforcement rules for glossary terms classification (tagging) at the asset level. With this capability, administrators can require that assets include specific business terms or classifications. Data producers must apply required glossary terms or classifications before an asset can be published. In this post, we show how to enforce business glossary classification rules in SageMaker Catalog.
Enhanced data discovery in Amazon SageMaker Catalog with custom metadata forms and rich text documentation
Amazon SageMaker Catalog now supports custom metadata forms and rich text descriptions at the column level, extending existing curation capabilities for business names, descriptions, and glossary term classifications. Column-level context is essential for understanding and trusting data. This release helps organizations improve data discoverability, collaboration, and governance by letting metadata stewards document columns using structured and formatted information that aligns with internal standards. In this post, we show how to enhance data discovery in SageMaker Catalog with custom metadata forms and rich text documentation at the schema level.
Introducing Amazon MWAA Serverless
Today, AWS announced Amazon Managed Workflows for Apache Airflow (MWAA) Serverless. This is a new deployment option for MWAA that eliminates the operational overhead of managing Apache Airflow environments while optimizing costs through serverless scaling. In this post, we demonstrate how to use MWAA Serverless to build and deploy scalable workflow automation solutions.
Introducing the Amazon OpenSearch Lens for the AWS Well-Architected Framework
In this post, we show you how to use the Amazon OpenSearch Service Lens to evaluate your OpenSearch Service workloads against architectural best practices.
Amazon MSK Express brokers now support Intelligent Rebalancing for 180 times faster operation performance
Effective today, all new Amazon Managed Streaming for Apache Kafka (Amazon MSK) Provisioned clusters with Express brokers will support Intelligent Rebalancing at no additional cost. In this post we’ll introduce the Intelligent Rebalancing feature and show an example of how it works to improve operation performance.
Enhanced search with match highlights and explanations in Amazon SageMaker
Amazon SageMaker now enhances search results in Amazon SageMaker Unified Studio with additional context that improves transparency and interpretability. The capability introduces inline highlighting for matched terms and an explanation panel that details where and how each match occurred across metadata fields such as name, description, glossary, and schema. In this post, we demonstrate how to use enhanced search in Amazon SageMaker.
Amazon Kinesis Data Streams launches On-demand Advantage for instant throughput increases and streaming at scale
Today, AWS announced the new Amazon Kinesis Data Streams On-demand Advantage mode, which includes warm throughput capability and an updated pricing structure. With this feature you can enable instant scaling for traffic surges while optimizing costs for consistent streaming workloads. In this post, we explore this new feature, including key use cases, configuration options, pricing considerations, and best practices for optimal performance.
Amazon Kinesis Data Streams now supports 10x larger record sizes: Simplifying real-time data processing
Today, AWS announced that Amazon Kinesis Data Streams now supports record sizes up to 10MiB – a tenfold increase from the previous limit. In this post, we explore Amazon Kinesis Data Streams large record support, including key use cases, configuration of maximum record sizes, throttling considerations, and best practices for optimal performance.
Introducing Apache Airflow 3 on Amazon MWAA: New features and capabilities
AWS announced the general availability of Apache Airflow 3 on Amazon Managed Workflows for Apache Airflow (Amazon MWAA). This release transforms how organizations use Apache Airflow to orchestrate data pipelines and business processes in the cloud, bringing enhanced security, improved performance, and modern workflow orchestration capabilities to Amazon MWAA customers. This post explores the features of Airflow 3 on Amazon MWAA and outlines enhancements that improve your workflow orchestration capabilities









