Hearst Data Analytics Case Study

2018

Hearst Corporation, headquartered in New York City, is one of the largest media and information companies in the world. The company owns 15 daily and 36 weekly newspapers and more than 300 popular magazines worldwide, including Cosmopolitan, Esquire, and O, The Oprah Magazine. Hearst also has ownership interests in 31 television stations and leading cable television networks such as ESPN and A&E Networks. The organization’s diverse portfolio of interests also includes digital distribution and real estate ventures.

start a python tutorial
kr_quotemark

I don't know how we could have made our clickstream data pipeline work without Amazon Kinesis services. It would have involved many weeks of engineering. Kinesis Data Streams and Firehose make the entire process extremely simple and reliable."

Peter Jaffe
Data Scientist, Hearst Corporation

The Challenge

  • Needed to develop a platform that analyzed real-time clickstream events such as readership statistics, impressions, and page views for more than 300 global websites and apps.
  • Wanted to give editors a better way to monitor and analyze trending content in order to promote cross-platform sharing and increase consumer engagement.
  • Sought to use clickstream data to perform data science, develop algorithms, and create visualizations and dashboards to support Hearst business stakeholders.

Why Amazon Web Services

  • Built a clickstream analytics platform that transmits and processes more than 30 terabytes of clickstream data a day, streamed from more than 300 Hearst websites worldwide.
  • Amazon Kinesis Firehose automatically moves buffered data from Amazon Kinesis Data Streams into persistent storage on Amazon Simple Storage Service (Amazon S3). This replaces an Amazon Elastic Compute Cloud (Amazon EC2) instance the team previously had to manage.
  • The transformed clickstream data is pulled from a Hearst data lake and sent to Amazon Redshift for analytical queries and complex data science work.
  • From Amazon Redshift, the data gets pushed to end users through an API to the company’s content management system.

The Benefits

  • Fast insights. With the clickstream analytics platform, Hearst can make the entire data stream—from website click to aggregated data—available to editors in minutes.
  • Simplified data analysis. The Hearst corporate development team spends less time managing the data pipeline and more time focusing on analytics.
  • Increased content recirculation. With the ability to get content metrics quickly, Hearst editors have increased the recirculation of trending content by more than 25 percent.
  • Reduced complexity. Using Amazon Kinesis Firehose, Hearst no longer needs to manage and monitor the movement of buffered data. As a result, the team does not have to worry about possibly needing to replace an EC2 instance.

About Hearst

Hearst Corporation, headquartered in New York City, is one of the largest media and information companies in the world.


AWS Services Used

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. 

Learn more »

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores and analytics tools.

Learn more >>

Amazon Kinesis

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.

Learn more >>

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is a massively scalable, highly durable data ingestion and processing service optimized for streaming data. 

Learn more >>


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.