AWS News Blog

Category: Amazon EMR

Amazon EMR Serverless Now Generally Available – Run Big Data Applications without Managing Servers

At AWS re:Invent 2021, we introduced three new serverless options for our data analytics services – Amazon EMR Serverless, Amazon Redshift Serverless, and Amazon MSK Serverless – that make it easier to analyze data at any scale without having to configure, scale, or manage the underlying infrastructure. Today we announce the general availability of Amazon […]

New – Create and Manage EMR Clusters and Spark Jobs with Amazon SageMaker Studio

Today, we’re very excited to offer three new enhancements to our Amazon SageMaker Studio service. As of now, users of SageMaker Studio can create, terminate, manage, discover, and connect to Amazon EMR clusters running within a single AWS account and in shared accounts across an organization—all directly from SageMaker Studio. Furthermore, SageMaker Studio Notebook users […]

Customize and Package Dependencies With Your Apache Spark Applications on Amazon EMR on Amazon EKS

Last AWS re:Invent, we announced the general availability of Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS), a new deployment option for Amazon EMR that allows customers to automate the provisioning and management of Apache Spark on Amazon EKS. With Amazon EMR on EKS, customers can deploy EMR applications on the same Amazon EKS […]

New – Amazon EMR on Amazon Elastic Kubernetes Service (EKS)

Tens of thousands of customers use Amazon EMR to run big data analytics applications on frameworks such as Apache Spark, Hive, HBase, Flink, Hudi, and Presto at scale. EMR automates the provisioning and scaling of these frameworks and optimizes performance with a wide range of EC2 instance types to meet price and performance requirements. Customer […]

New – Using Step Functions to Orchestrate Amazon EMR Workloads

AWS Step Functions allows you to add serverless workflow automation to your applications. The steps of your workflow can run anywhere, including in AWS Lambda functions, on Amazon Elastic Compute Cloud (Amazon EC2), or on-premises. To simplify building workflows, Step Functions is directly integrated with multiple AWS Services: Amazon Elastic Container Service (Amazon ECS), AWS […]

New – Insert, Update, Delete Data on S3 with Amazon EMR and Apache Hudi

Storing your data in Amazon S3 provides lots of benefits in terms of scale, reliability, and cost effectiveness. On top of that, you can leverage Amazon EMR to process and analyze your data using open source tools like Apache Spark, Hive, and Presto. As powerful as these tools are, it can still be challenging to deal with use cases where […]

New – Amazon EMR Instance Fleets

Today we’re excited to introduce a new feature for Amazon EMR clusters called instance fleets. Instance fleets gives you a wider variety of options and intelligence around instance provisioning. You can now provide a list of up to 5 instance types with corresponding weighted capacities and spot bid prices (including spot blocks)! EMR will automatically provision […]

Human Longevity, Inc. – Changing Medicine Through Genomics Research

Human Longevity, Inc. (HLI) is at the forefront of genomics research and wants to build the world’s largest database of human genomes along with related phenotype and clinical data, all in support of preventive healthcare. In today’s guest post, Yaron Turpaz, Bryan Coon, and Ashley Van Zeeland talk about how they are using AWS to […]