AWS Big Data Blog

Implement Amazon EMR HBase Graceful Scaling

Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. We can use Amazon EMR with HBase on top of Amazon Simple Storage Service (Amazon S3) for random, strictly consistent real-time access for tables with Apache Kylin. This post demonstrates how to gracefully decommission target region servers programmatically.

Architect fault-tolerant applications with instance fleets on Amazon EMR on EC2

In this post, we show how to optimize capacity by analyzing EMR workloads and implementing strategies tailored to your workload patterns. We walk through assessing the historical compute usage of a workload and use a combination of strategies to reduce the likelihood of InsufficientCapacityExceptions (ICE) when Amazon EMR launches specific EC2 instance types. We implement flexible instance fleet strategies to reduce dependency on specific instance types and use Amazon EC2 On-Demand Capacity Reservation (ODCRs) for predictable, steady-state workloads. Following this approach can help prevent job failures due to capacity limits while optimizing your cluster for cost and performance.

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

We are excited to announce the general availability of SageMaker Unified Studio. In this post, we explore the benefits of SageMaker Unified Studio and how to get started.

Announcing end-of-support for Amazon Kinesis Client Library 1.x and Amazon Kinesis Producer Library 0.x effective January 30, 2026

Amazon Kinesis Client Library (KCL) 1.x and Amazon Kinesis Producer Library (KPL) 0.x will reach end-of-support on January 30, 2026. Accordingly, these versions will enter maintenance mode on April 17, 2025. During maintenance mode, AWS will provide updates only for critical bug fixes and security issues. Major versions in maintenance mode will not receive updates for new features or feature enhancements.

Deploy real-time analytics with StarTree for managed Apache Pinot on AWS

In this post, we introduce StarTree as a managed solution on AWS for teams seeking the advantages of Pinot. We highlight the key distinctions between open-source Pinot and StarTree, and provide valuable insights for organizations considering a more streamlined approach to their real-time analytics infrastructure.

Develop and test AWS Glue 5.0 jobs locally using a Docker container

In this post, we show how to develop and test AWS Glue 5.0 jobs locally using a Docker container. This post is an updated version of the post Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container, and uses AWS Glue 5.0.

Enhancing Adobe Marketo Engage Data Analysis with AWS Glue Integration

In this post, we show you how to use AWS Glue to extract data from Marketo Engage for data processing and enrichment on AWS for use in marketing analytics workflows.

Unlock the power of optimization in Amazon Redshift Serverless

In this post, we demonstrate how Amazon Redshift Serverless AI-driven scaling and optimization impacts performance and cost across different optimization profiles.

Express brokers for Amazon MSK: Turbo-charged Kafka scaling with up to 20 times faster performance

In this post, we walk you through the implementation of MSK Express brokers, highlighting their core features, benefits, and best practices for rapid Kafka scaling.

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

In this post, we dive into the newly released feature of Amazon Redshift Data API support for SSO, Amazon Redshift RBAC for row-level security (RLS) and column-level security (CLS), and trusted identity propagation with AWS IAM Identity Center to let corporate identities connect to AWS services securely. We demonstrate how to integrate these services to create a data visualization application using Streamlit, providing secure, role-based access that simplifies user management while making sure that your organization can make data-driven decisions with enhanced security and ease.