AWS eliminates the need for hardware procurement or detailed sizing, so we had Atlas developed and running in less than two months. Because we were able to begin operation so quickly, we could get right to work on analyzing the data and providing valuable information for the development of services and marketing initiatives.
Hajime Sano Data Team, B2C Unit, Digital Business, Nikkei Inc.

Nikkei Inc. is a multimedia company with newspaper publishing at its core. Its other offerings include magazines, digital media, and database services. In the words of its corporate philosophy, Nikkei aims to "contribute to the peaceful and democratic development of the Japanese economy, the basis of people's livelihoods, by providing fair and impartial news."

As a multimedia business, Nikkei expends considerable effort on digital services like the Nikkei Online Edition and its mobile apps. Unlike many other media sites, Nikkei derives the majority of its revenue from subscription fees rather than advertising.

For this reason, the company places great importance on audience engagement as a measure of its relationship with individual users as it seeks to expand its offerings as a trustworthy and informative media source. Nikkei's plan for improving its services based on engagement required a more detailed grasp of usage patterns than it had at the time. Instead of merely ranking articles based on page views (the number of times customers opened pages), Nikkei needed to know how customers were reading articles: which were read all the way to the end, where users interacted with articles by tapping and zooming, and so on.

Before adopting this engagement-based approach, Nikkei had been using third-party analysis tools. "Our existing third-party tools did make it possible, with effort, to obtain detailed interaction data and access logs, but we ran into limitations," says Hajime Sano of the Nikkei Digital Business B2C Unit Data Team. These limitations include:

  1. Time lag between data measurement and aggregation
  2. Difficulty in simultaneously handling attributes tied to readers and articles, due to the product's design
  3. Limitations on the amount of data that could be handled
  4. Volume-based fee structure, meaning that as measurement targets expanded, costs rose too
  5. Specialist knowledge required by the tool, and conversely difficulty applying general analytics knowledge and experience

"We began searching for a way to resolve these problems and handle more data more quickly, with a smaller investment and no limitations," Sano says.

Nikkei was already familiar with Amazon Web Services (AWS), which it used to support the Nikkei Online Edition. The company was also impressed by Amazon Redshift and other big data services from AWS, as well as the developmental possibilities of data-oriented technologies like Amazon Kinesis. "Constructing everything from zero ourselves had concrete benefits," says Sano. "It shortened the project timetable and let us realize our ideal environment while keeping costs down. Given the ease of development on AWS, as well as the variety of services it offered to rationalize operation, we decided that AWS was the logical choice for infrastructure."

Adopting AWS also allowed Nikkei to draw on its extensive in-house experience using the service. The ease of finding engineers who were highly familiar with AWS both within and without the company was another point in its favor. By using AWS's Tokyo region, Nikkei could create a service that was responsive enough for its needs even when handling large amounts of data.

Nikkei_Arch_Diagram

Nikkei architecture diagram

In September 2016, Nikkei decided on the architecture for its new AWS-based access-analytics tool, known as Atlas. By November, a prototype was in operation. "With an on-premises solution, it can take six months to a year just to obtain the hardware," says Sano. "AWS eliminates the need for hardware procurement or detailed sizing, so we had Atlas developed and running in less than two months. Because we were able to begin operation so quickly, we could get right to work on analyzing the data and providing valuable information for the development of services and marketing initiatives. There are now significantly more opportunities to make data-driven decisions within the organization."

Atlas processes log data in the following way:

1. Data collection endpoint writes data to Amazon Simple Queue Service (Amazon SQS).

2. Data expansion worker takes data from Amazon SQS, attaches a range of records including session data, attributes, and article links managed via Amazon DynamoDB, and writes the result to Amazon Kinesis.

3. Running on AWS Lambda, a Kinesis consumer takes records from Kinesis and writes each to Elasticsearch on Amazon Elastic Compute Cloud (Amazon EC2) and an in-memory database.

4. Running on AWS Elastic Beanstalk, a Kinesis consumer stores data for loading to Amazon Simple Storage Service (Amazon S3) and loads it to Amazon Redshift for batch processing.

5. Data users analyze the data through business intelligence and data science tools running on Amazon EC2 and Elastic Beanstalk.

Nikkei also uses AWS Snowball to rapidly migrate on-premises data to the cloud without putting pressure on communications bandwidth, and Amazon CloudWatch for monitoring.

Using Amazon SQS and Kinesis makes it possible for Atlas to smoothly absorb changing access loads on news articles for which traffic patterns are difficult to predict, processing thousands or even tens of thousands of requests per second with no difficulty.

Clickstream data currently arrives at Kinesis within 200 milliseconds on average. From data transfer to the front end to an analyst query hit takes about one second. This speed means data can be processed in near-real time. Nikkei expects this responsiveness to make a significant contribution to real-time recommendation functionality. "Atlas's processing speeds depend on the availability of a highly reliable messaging system like Amazon SQS. Also, because of the performance and reliability offered by AWS, we can use it in confidence without devoting excessive resources to monitoring and maintenance," says Toshiyuki Isobe, of the Corporate IT Consulting Center at NS Solutions, which provides architectural and operational support.

AWS's managed services not only shortened development time, but also made the solution more cost-effective. "Using AWS's managed services not only let us reduce development time, but also reduced costs to roughly a fifth of their former levels. On the other hand, ten or twenty times as much data can be processed, delivering an ROI of more than 5,000 percent. On top of all this, the system is also more usable than ever," says Sano.

nikkei-big-data_photo

- Hajime Sano, Data Team, B2C Unit, Digital Business, Nikkei Inc.

- Toshiyuki Isobe, Corporate IT Consulting Center, NS Solutions Corp.

In the future, Nikkei intends to introduce technologies like Amazon EMR, Amazon Redshift Spectrum, and Amazon Athena to Atlas to bring unstructured data into the analysis environment as well. The company is also exploring the possibilities of deploying Amazon Kinesis Analytics and Amazon SageMaker for real-time stream-data analysis and prediction and using Amazon Simple Notification Service (Amazon SNS) as a hub around which to construct a real-time marketing engine.

"We look forward to continuing using AWS's managed services to the utmost to create new marketing mechanisms around the idea of engagement, including dashboards that can visualize aggregated data in real time and recommendation engines that can immediately identify the most appropriate articles based on reader behavior," says Sano.

Learn more about how AWS can help you with big data analysis.