Posted On: Nov 30, 2021
We are happy to announce the preview of Amazon EMR Serverless, a new serverless option in Amazon EMR that makes it easy and cost-effective for data engineers and analysts to run petabyte-scale data analytics in the cloud. Amazon EMR is a cloud big data platform used by customers to run large-scale distributed data processing jobs, interactive SQL queries, and machine learning applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto. With EMR Serverless, customers can run applications built using these frameworks with a few clicks, without having to configure, optimize, or secure clusters. EMR Serverless automatically provisions and scales the compute and memory resources required by the application, and customers only pay for the resources they use.
With EMR Serverless, you simply specify the open-source framework and version that you want to use for your application, and submit jobs using APIs, EMR Studio, or JDBC/ODBC clients. EMR Serverless automatically determines and provisions the compute and memory resources required to process requests, and scales the resources up and down at different stages of processing based on changing requirements. For example, a Spark job may need two executors for the first 5 minutes, ten executors for the next 10 minutes, and five executors for the last 20 minutes to process your data. EMR Serverless automatically provisions and adjusts resources as required, so you do not have to worry when data volumes change over time. And, since you only pay for the resources that are used, EMR Serverless is cost-effective for running petabyte-scale analytics. Customers can check the status of running jobs, review job history, and use familiar open source tools to debug jobs using EMR Studio.
Amazon EMR Serverless is available in Preview in US-East (N Virginia) region. Click here to sign up for the preview, read the blog, and refer to documentation for more details.