I don't know how we could have made our clickstream data pipeline work without Amazon Kinesis. It would have involved many weeks of engineering. Kinesis Streams and Firehose make the entire process extremely simple and reliable. 
Peter Jaffe Data Scientist

Hearst Corporation, headquartered in New York City, is one of the largest media and information companies in the world. The company owns 15 daily and 36 weekly newspapers and more than 300 popular magazines worldwide, including Cosmopolitan, Esquire, and O, The Oprah Magazine. Hearst also has ownership interests in 31 television stations and leading cable television networks such as ESPN and A&E Networks. The organization’s diverse portfolio of interests also includes digital distribution and real estate ventures.

  • Needed to develop a platform that analyzed real-time clickstream events such as readership statistics, impressions, and page views for more than 300 global websites and apps.
  • Wanted to give editors a better way to monitor and analyze trending content in order to promote cross-platform sharing and increase consumer engagement.
  • Sought to use clickstream data to perform data science, develop algorithms, and create visualizations and dashboards to support Hearst business stakeholders.
     
  • Built a clickstream analytics platform that transmits and processes more than 30 terabytes of clickstream data a day, streamed from more than 300 Hearst websites worldwide.
  • Amazon Kinesis Firehose automatically moves buffered data from Amazon Kinesis Streams into persistent storage on Amazon Simple Storage Service (Amazon S3). This replaces an Amazon Elastic Compute Cloud (Amazon EC2) instance the team previously had to manage.
  • The transformed clickstream data is pulled from a Hearst data lake and sent to Amazon Redshift for analytical queries and complex data science work.
  • From Amazon Redshift, the data gets pushed to end users through an API to the company’s content management system.
  • Fast insights. With the clickstream analytics platform, Hearst can make the entire data stream—from website click to aggregated data—available to editors in minutes.
  • Simplified data analysis. The Hearst corporate development team spends less time managing the data pipeline and more time focusing on analytics.
  • Increased content recirculation. With the ability to get content metrics quickly, Hearst editors have increased the recirculation of trending content by more than 25 percent.
  • Reduced complexity. Using Amazon Kinesis Firehose, Hearst no longer needs to manage and monitor the movement of buffered data. As a result, the team does not have to worry about possibly needing to replace an EC2 instance.

To learn more about how AWS can help you manage your streaming data pipeline, visit our Big Data details page and our streaming data page.