AWS Big Data Blog

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

We are excited to announce the general availability of SageMaker Unified Studio. In this post, we explore the benefits of SageMaker Unified Studio and how to get started.

Announcing end-of-support for Amazon Kinesis Client Library 1.x and Amazon Kinesis Producer Library 0.x effective January 30, 2026

Amazon Kinesis Client Library (KCL) 1.x and Amazon Kinesis Producer Library (KPL) 0.x will reach end-of-support on January 30, 2026. Accordingly, these versions will enter maintenance mode on April 17, 2025. During maintenance mode, AWS will provide updates only for critical bug fixes and security issues. Major versions in maintenance mode will not receive updates for new features or feature enhancements.

Deploy real-time analytics with StarTree for managed Apache Pinot on AWS

In this post, we introduce StarTree as a managed solution on AWS for teams seeking the advantages of Pinot. We highlight the key distinctions between open-source Pinot and StarTree, and provide valuable insights for organizations considering a more streamlined approach to their real-time analytics infrastructure.

Develop and test AWS Glue 5.0 jobs locally using a Docker container

In this post, we show how to develop and test AWS Glue 5.0 jobs locally using a Docker container. This post is an updated version of the post Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container, and uses AWS Glue 5.0.

Enhancing Adobe Marketo Engage Data Analysis with AWS Glue Integration

In this post, we show you how to use AWS Glue to extract data from Marketo Engage for data processing and enrichment on AWS for use in marketing analytics workflows.

Unlock the power of optimization in Amazon Redshift Serverless

In this post, we demonstrate how Amazon Redshift Serverless AI-driven scaling and optimization impacts performance and cost across different optimization profiles.

Express brokers for Amazon MSK: Turbo-charged Kafka scaling with up to 20 times faster performance

In this post, we walk you through the implementation of MSK Express brokers, highlighting their core features, benefits, and best practices for rapid Kafka scaling.

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

In this post, we dive into the newly released feature of Amazon Redshift Data API support for SSO, Amazon Redshift RBAC for row-level security (RLS) and column-level security (CLS), and trusted identity propagation with AWS IAM Identity Center to let corporate identities connect to AWS services securely. We demonstrate how to integrate these services to create a data visualization application using Streamlit, providing secure, role-based access that simplifies user management while making sure that your organization can make data-driven decisions with enhanced security and ease.

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

In this post, we will cover how you can use Amazon DataZone to facilitate data collaboration between AWS accounts.

Amazon OpenSearch Service vector database capabilities revisited

As we enter 2025, OpenSearch Service support for OpenSearch 2.17 brings these improvements to the service. In this post, we walk through 2024’s innovations with an eye to how you can adopt new features to lower your cost, reduce your latency, and improve the accuracy of your search results and generated text.