Using Amazon Kinesis, our solution delivers new events to user dashboards in less than 10 seconds. Our product teams can see and respond to usage patterns immediately, and our operations professionals can monitor performance to detect and mitigate anomalies before they impact the customer’s experience.

 

Anders Fritz Senior Manager, Product Innovation
  • About Thomson Reuters

    Thomson Reuters is a leading source of information—including one of the world’s most trusted news organizations—for the world’s businesses and professionals. It provides companies with the intelligence, technology, and human expertise they need to find trusted answers, enabling them to make better decisions more quikcly. Its customers include financial, risk, legal, tax, accounting, and media markets.

  • Services Used

  • Benefits of AWS

    • Ability to process up to 4,000 events per second, anticipated to scale to 10,000 within a year
    • Elastic scaling accommodates twofold to threefold traffic increases during breaking news
    • No data loss or downtime since launch thanks to robust failover architecture
    • Near-real-time availability of analytics data
    • Simultaneous streaming and batching of data in one solution

Thomson Reuters offers hundreds of digital products and services for customers ranging from law firms to banks to consumers. In 2016, Thomson Reuters decided to build a solution that would enable it to capture, analyze, and visualize analytics data generated by its offerings, providing insights to help product teams continuously improve the user experience.

There are many commercially available usage-analytics services. However, Thomson Reuters decided to build its own to manage costs, take ownership of analytics data, and enrich that data with other information such as document metadata.

The company knew it wanted to build the solution in the cloud and identified multiple requirements for the underlying platform. First, it had to protect information with encryption in transit and at rest. It also needed to handle thousands of events per second and scale elastically to accommodate the doubling or tripling of traffic during breaking news. And, because the group that would be building the solution was relatively small, the company needed to minimize administration and management tasks so it could focus on building new features and supporting product teams. Finally, Thomson Reuters wanted the solution to go live quickly, in only five months.

Thomson Reuters quickly realized that Amazon Web Services (AWS) was the only platform that could meet all its needs. The company was already using AWS in several capacities and was familiar with its capabilities and scale. The analytics solution built by Thomson Reuters—called Product Insight—relies on a number of AWS services.

The initial event ingestion layer is composed of Elastic Load Balancing and customized NGINX web servers in an Auto Scaling group. After SSL/TLS termination, the ingestion layer augments events with metadata and encrypts them using AWS Key Management Service (KMS).

The ingestion layer hands off secured data to a streaming data pipeline composed of Amazon Kinesis Streams, Amazon Kinesis Firehose, and AWS Lambda serverless compute. Thomson Reuters evaluated other streaming data tools, including Apache Kafka, but found them difficult to manage and scale. The company did not want to worry about managing the software stack and a fleet of servers, so instead chose Amazon Kinesis because it is fully managed.

The Amazon Kinesis streaming-data pipeline automatically batches data and delivers it cost effectively into a master data set for permanent storage in an Amazon Simple Storage Service (Amazon S3) bucket, replicated across regions. The master data set enables Thomson Reuters to apply additional transformation steps, recover data in the event of a system loss of state, and support new business cases. If events cannot immediately be dispatched from the ingestion layer to the data pipelines, a failover mechanism delivers them to Amazon S3 to be replayed when the system returns to normal operations.

AWS Lambda allows Thomson Reuters to load and process the streaming data cost effectively and without needing to provision or manage any servers. Lambda collects data from the Kinesis pipeline and loads it into the master dataset in Amazon S3. Lambda is also triggered by Amazon S3’s data notifications whenever new data is stored, and performs the additional transformations on the master dataset. Lambda runs code only when triggered by data via integrations with Kinesis and Amazon S3, and it charges for compute processing only when the code is running.

A parallel real-time pipeline attached to the Amazon Kinesis stream delivers the events to a secure, multi-tenant Elasticsearch cluster through a custom extract, transform, and load (ETL) server connected to the Thomson Reuters Services platform, all hosted on AWS. The real-time data is made available to authorized Thomson Reuters product teams through Kibana, an open-source data analytics and visualization tool.

The Thomson Reuters Services platform also provides the authentication and authorization layer using AWS Identity and Access Management (IAM) and Amazon S3 cross-account access features. To monitor the solution, the company uses Amazon CloudWatch.  

Product Insight launched two months ahead of schedule and has exceeded technical expectations. “Our initial goal was to accommodate 2,000 events per second,” says Anders Fritz, senior manager of product innovation at Thomson Reuters. “Our tests show that Product Insight on AWS can process up to 4,000 events per second, and within a year we expect to increase that to more than 10,000 events per second.” This figure represents more than 25 billion events per month.

Even with this high throughput, the system has not lost any data since its inception. “Because of the robust failover architecture and the technical capabilities of AWS, we have not lost a single event since we started collecting data,” says Fritz.

This includes usage spikes during news events such as the U.S. presidential election and the U.K. referendum on exiting the European Union. “Even when daily event volumes doubled, the ingest pipeline scaled up and down without any issues,” says Marco Pierleoni, lead software engineer at Thomson Reuters.

Internal product teams have quickly adopted Product Insight, and adding them to the system is fast and easy. “We can get teams set up quickly, ranging from an hour to a couple of days,” says Fritz. “Most of that time is spent planning what data the team wants to analyze. On the back end, we can set up the system to receive product data in a matter of minutes.” The onboarding process is accelerated using SDKs with standardized data schemas.

Since Product Insight is built on a streaming-data architecture using Amazon Kinesis, product teams have access to data almost instantly. “Using Amazon Kinesis, our solution delivers new events to user dashboards in less than 10 seconds,” says Fritz. “Our product teams can see and respond to usage patterns immediately, and our operations professionals can monitor performance to detect and mitigate anomalies before they impact the customer’s experience.”

Because Product Insight requires minimal administration, engineers can spend their time working with product teams to add business value rather than managing infrastructure. And, the security enabled by AWS Key Management Service helps ensure the solution meets internal and external compliance requirements.

Learn more about real-time data-ingestion platforms and Amazon Kinesis.