Digital Turbine Streams 180M Events Per Day with Delta Lake Running on Databricks with AWS
Databricks is an AWS Advanced Technology Partner
Digital Turbine’s On-Device Media Platform enables end users to discover new apps, OEMs to promote their content, and advertisers to reach more consumers. The company runs on constant streams of real-time data to make sure the right app reaches the right user at exactly the right time. But after a major growth spurt—where Digital Turbine doubled its staff and data—it needed a new platform that could scale and still deliver accurate, precise, reliable, and fast data query responses. Using Delta Lake running on Databricks, Digital Turbine now makes petabytes of data available in minutes, delivering 180 million events per day.
Streaming Platform Reaches Limit at 180M Events Per Day
Digital Turbine relies on several streaming data sources to determine how best to serve apps to end users. The 180 million events that flow through the data system include unified mobile events derived from installed app activities on mobile devices, legacy mobile events, and campaign events from app installs and opens. All were brought together into a data sink where unified mobile events were used for BI analysis, legacy events were sent to Amazon Redshift, and campaign events were stored in a data lake then aggregated by Apache Spark and written to Redshift for revenue reports.
However, the Digital Turbine team was struggling to maintain an architecture that contained mostly self-managed clusters. It wanted to standardize ETL approach and was only able to run batch ETLs for legacy events. The setup resulted in higher costs and the inability to deliver on service level agreements (SLA).
Databricks on AWS Delivers Scale for Faster Queries
Using Spark and Delta Lake running on Databricks, Digital Turbine created a new architecture in just one month that improved data streaming and achieved a 99.99 percent SLA. The platform sends logging events to a microservice that publishes list events to Amazon Managed Streaming for Apache Kafka (Amazon MKS). From there, Databricks consumes the stream and writes the raw data to Delta Lake as a regionalized table in Amazon Simple Storage Service (Amazon S3). The new architecture enables real-time dashboards to track key performance indicators, ad-hoc queries via Notebooks, and fast transformations using Databricks fully-managed clusters. “We needed to be able to deliver fast query responses to enable internal business analysts,” said Daniel Ferrante, Director of Platform Engineering. “Now they are using Notebooks to query Delta Lake directly.”
“We needed to be able to deliver fast query responses to enable internal business analysts. Now they are using Databricks Notebooks to query Delta Lake directly.”
- Daniel Ferrante, Director of Platform Engineering, Digital Turbine
Accelerated Data Query Responses
With auto-optimize on, Digital Turbine developers can continuously optimize the size of Amazon S3 parquet files, speeding up the time it takes to pull data. The highest load partner handles between 300 and 500 events per second. In addition, the team can easily join tables using Delta Lake, Amazon S3, and Redshift dimensions with Databricks. Through Notebook templates, developers can quickly access commonly used queries, instead of starting from scratch.
Improved Data Access and Visibility
By integrating Delta Lake running on Databricks into its architecture, Digital Turbine made its unified mobile events data much more accessible to nontechnical users. Now, business analysts can create reports in Tableau that are connected to Delta Lake, the operations team can more easily isolate production issues, and technical account managers can detect business issues earlier on.
About Digital Turbine
- Built new event streaming platform in one month
- Created solution that makes data available in minutes
- Delivered 180 million events per day
- Achieved 99.99% SLA
Databricks is a data and AI company. Thousands of organizations worldwide—including Comcast, Condé Nast, Nationwide and H&M— rely on Databricks’ open and unified platform for data engineering, machine learning, and analytics. Founded by the original creators of Apache Spark™, Delta Lake, and MLflow, Databricks was first launched on AWS.
Published December 2020