Presto on Amazon EMR
Features and benefits
Customer success
Netflix customer success
Netflix has chosen Presto as their interactive, ANSI-SQL compliant query engine for big data. Presto scales well, is open source, and integrates with the Hive Metastore and Amazon S3 - the backbone of Netflix’s big data warehouse environment. Netflix runs Presto on persistent Amazon EMR clusters to quickly and flexibly query across their ~25 PB Amazon S3 data store. Netflix is an active contributor to Presto, and Amazon EMR provides Netflix with the flexibility to run their own build of Presto on Amazon EMR clusters. On average, Netflix runs ~3,500 queries per day on their Presto clusters.
Jammp customer success
Jampp is a mobile application marketing platform that uses advanced advertising retargeting techniques to drive engaged users to applications. Jampp achieves this by buying mobile media inventory via its own conversion-driven real-time bidding (RTB) engine, which dynamically bids on inventory across 18 RTB exchanges and over 150 mobile ad networks. Jampp leverages Presto running on Amazon EMR for advanced ad hoc log analysis, combining data from multiple sources and complex retargeting segment calculations. As Jampp's user base grew by 600%, so did the demand for complex analytical queries. Jampp moved from running a complex multi-core Python application on MySQL, to running Presto, resulting in 12x performance improvement. Jampp currently uses Presto on Amazon EMR to process 40 TB of data per day.
Cogo Labs customer success
As a start-up incubator, Cogo Labs operates a platform for marketing analytics and business intelligence used by their portfolio companies and internal teams. To support an OLAP environment with a high rate of innovation, they standardized on SQL to interact with data. Cogo Labs chose Presto for its real-time query performance, support for ANSI-SQL and ability to process data directly from Amazon S3. Presto running on Amazon EMR allows their 100+ developers and analysts to run SQL queries on over 500 TB of data stored in Amazon S3 for data-exploration, ad hoc analysis, and reporting. Cogo Labs uses a combination of short-lived and permanent clusters and relies on Amazon EMR's integration with Spot instances to lower costs.
OpenSpan customer success
OpenSpan provides automation and intelligence solutions that help bridge people, processes and technology to gain insight into employee productivity, simplify transactions and engage employees and customers. OpenSpan migrated from HBase to Presto on Amazon EMR with data in Amazon S3. OpenSpan chose Presto because of its SQL interface and ability to query data in real-time directly from Amazon S3; it allowed them to quickly explore vast amounts of data and rapidly iterate on upcoming data products. OpenSpan uses the parquet file format, and also uses PrestogreSQL to connect to Presto. OpenSpan chose Amazon EMR and Amazon S3 to process the gigabytes of data they receive daily from their customers cost efficiently.
Kanmu customer success
Kanmu is a Japanese startup in the financial services industry and provides card-linked offers based on consumers' credit card usage. Kanmu migrated from Hive to using Presto on Amazon EMR because of Presto’s ability to run exploratory and iterative analytics at interactive speed, good performance with Amazon S3, and scalability to query large data sets. Kanmu uses Fluentd-plugin-s3 to push data to Amazon S3, the optimized row columnar (ORC) format to store data and use shib, a node.js-based web client to run SQL queries.