Posted On: Oct 27, 2022

Amazon EMR supports PrestoDB and Trino for running interactive SQL analytics over large datasets across multiple data sources. Today, we’re excited to announce the latest PrestoDB and Trino updates included in EMR release 6.8.

With PrestoDB and Trino on EMR 6.8, users benefit from a configuration setting, called the strict mode that prevents cost overruns due to long running queries.Customers have told us that poorly written SQL queries can sometimes run for long times, and consume resources from other business critical workloads. To help administrators take action on such queries, we are introducing strict mode setting that allows warning or rejecting certain types of queries. Examples include queries without predicates on partitioned columns that result in large table scans, or queries that involve cross join between large tables, and/or queries that sort large number of rows without limit. You can set up strict mode configuration during cluster creation and also override the setting using session properties. You can apply strict mode checks for select, insert, create table as select and explain analyze query types.

We are also excited to announce that Amazon EMR PrestoDB and Trino has added a new features to handle spot interruptions that helps run your queries cost effectively and reliably. Spot Instances in Amazon EMR allows you to run big data workloads on spare Amazon EC2 capacity at a reduced cost compared to On-Demand instances. However, Amazon EC2 can interrupt spot instances with a two-minute notification. PrestoDB/Trino queries fail when spot nodes are terminated. This has meant that customers were unable to run such workloads on spot instances and take advanatage of lower costs. In EMR 6.7, we added a new capability to PrestoDB/Trino engine to detect spot interruptions and determine if the existing queries can complete within two minutes on those nodes. If the queries cannot finish, we fail quickly and retry the queries on different nodes. Amazon EMR PrestoDB/Trino engine also does not schedule new queries on spot nodes that are about to be reclaimed. With these two new features, you will get best of both worlds - improved resiliency with PrestoDB/Trino engine on Amazon EMR, and running queries economically on spot nodes.

You can use this capabilities in all Regions where Amazon EMR PrestoDB and Trino are available. To learn more, please see the Presto and Trino section in the Amazon EMR Release Guide.