AWS Big Data Blog
Category: Best Practices
Best practices for upgrading Amazon MWAA V1.x to V2.x
In this post, we explore best practices for upgrading your Amazon MWAA environment and provide a step-by-step guide to seamlessly transition to the latest version.
Enhancing data durability in Amazon EMR HBase on Amazon S3 with the Amazon EMR WAL feature
In this post, we dive deep into the new Amazon EMR WAL feature to help you understand how it works, how it enhances durability, and why it’s needed. We explore several scenarios that are well-suited for this feature.
Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless
In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to Amazon EMR Serverless, detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations.
Introducing Amazon Q Developer in Amazon OpenSearch Service
today we introduced Amazon Q Developer support in OpenSearch Service. With this AI-assisted analysis, both new and experienced users can navigate complex operational data without training, analyze issues, and gain insights in a fraction of the time. In this post, we share how to get started using Amazon Q Developer in OpenSearch Service and explore some of its key capabilities.
Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg
Natural Intelligence (NI) is a world leader in multi-category marketplaces. In this blog post, NI shares their journey, the innovative solutions developed, and the key takeaways that can guide other organizations considering a similar path. This article details NI’s practical approach to this complex migration, focusing less on Apache Iceberg’s technical specifications, but rather on the real-world challenges and solutions encountered during the transition to Apache Iceberg, a challenge that many organizations are grappling with.
Enhance Agentforce data security with Private Connect for Salesforce Data Cloud and Amazon Redshift – Part 3
In this post, we discuss how to create AWS endpoint services to improve data security with Private Connect for Salesforce Data Cloud.
Architect fault-tolerant applications with instance fleets on Amazon EMR on EC2
In this post, we show how to optimize capacity by analyzing EMR workloads and implementing strategies tailored to your workload patterns. We walk through assessing the historical compute usage of a workload and use a combination of strategies to reduce the likelihood of InsufficientCapacityExceptions (ICE) when Amazon EMR launches specific EC2 instance types. We implement flexible instance fleet strategies to reduce dependency on specific instance types and use Amazon EC2 On-Demand Capacity Reservation (ODCRs) for predictable, steady-state workloads. Following this approach can help prevent job failures due to capacity limits while optimizing your cluster for cost and performance.
Design patterns for implementing Hive Metastore for Amazon EMR on EKS
In this post, we explore the design patterns for implementing the Hive Metastore (HMS) with EMR on EKS with Spark Operator, each offering distinct advantages depending on your requirements. Whether you choose to deploy HMS as a sidecar container within the Apache Spark Driver pod, or as a Kubernetes deployment in the data processing EKS cluster, or as an external HMS service in a separate EKS cluster, the key considerations revolve around communication efficiency, scalability, resource isolation, high availability, and security.
Governing streaming data in Amazon DataZone with the Data Solutions Framework on AWS
In this post, we explore how AWS customers can extend Amazon DataZone to support streaming data such as Amazon Managed Streaming for Apache Kafka (Amazon MSK) topics. Developers and DevOps managers can use Amazon MSK, a popular streaming data service, to run Kafka applications and Kafka Connect connectors on AWS without becoming experts in operating it.
Migrate from Standard brokers to Express brokers in Amazon MSK using Amazon MSK Replicator
Creating a new cluster with Express brokers is straightforward, as described in Amazon MSK Express brokers. However, if you have an existing MSK cluster, you need to migrate to a new Express based cluster. In this post, we discuss how you should plan and perform the migration to Express brokers for your existing MSK workloads on Standard brokers. Express brokers offer a different user experience and a different shared responsibility boundary, so using them on an existing cluster is not possible. However, you can use Amazon MSK Replicator to copy all data and metadata from your existing MSK cluster to a new cluster comprising of Express brokers.









