How Gunosy built a comment feature in News Pass using Amazon Neptune

This guest post is a translation and adaption from How to implement and operate News Pass comment feature in GraphDB using Amazon Neptune, published in Japanese by Gunosy.

Gunosy’s motto is to “Optimally deliver information to people around the world.” In their own words “Gunosy has developed and operated multiple media businesses, including the information curation service Gunosy; the news distribution application News Pass, which is provided by Japanese telecommunication provider KDDI and Gunosy jointly; and the women’s trend information application LUCRA. In addition to the media business, Gunosy also has business in ad-tech, Gunosy Ads, and the Gunosy Ad Network. The information curation service collects and distributes information sourced from the vast amount of information on the internet, filtered by specific criteria. Gunosy uses algorithms to collect and organize information to deliver the right information to the right people.”

News Pass is a free application that allows customers to check trending news easily. The application can deliver selected information from affiliated media automatically by using a unique information analysis and distribution technology. News Pass allows users to add their comments to a news article. This post discusses how Gunosy implements and operates the News Pass comment feature using Amazon Neptune.

Why did Gunosy choose Amazon Neptune?

Before we implemented the comment feature in News Pass, we considered an implementation using Amazon Aurora and Amazon DynamoDB. However, we decided to adopt Neptune because comments are primarily a graph structure.

We liked that it is easy to add features, such as adding comments to an object (for example, another comment) other than the news article itself. Another advantage is that it is easy to implement simple recommendations.

What does the graph in Amazon Neptune look like?

Neptune enables comments for articles in News Pass. The following diagram shows the data structure of the comment data.

The data structure includes the following elements:

The edge of about connects the article and comment (A comment about an article is linked to that article.)
The edge of post connects the comment and user (A user posts the comment.)
The edge of like connects the comment and user (A user likes the comment.)
The edge of delete connects the comment and user (A user who posted a comment can delete it.)
The comment vertex holds id and body as properties. A vertex can have many properties, and an individual property can be held as an array.
For like, post, and delete, each edge has the timestamp of when they were connected as a property. Unlike vertices, an edge can only have one single property.

In this way, the graph database uses vertices and edges, as well as their labels and properties, to easily represent many kinds of objects and relationships between individual objects.

We can also increase or decrease the number of properties on individual objects because we don’t have to define the strict schema beforehand in Neptune, unlike relational databases. For example, to implement a function to comment on a specific comment, you can extend a line from the comment vertex with about edge and complete the connection with the target comment vertex.

Performance optimization

With Neptune, we could achieve a response time of 40 milliseconds at the 99th percentile for the API to get comments. However, to maintain a fast response time, we needed to configure data structures and queries appropriately. The following is the explanation of the data structures and queries we optimized to maintain the fast response time in Neptune.

Data structure

For operational reasons, the operator had to regularly get comments that commenters had already deleted.

Initially, the property of the comment vertex optionally held deleted_at, which indicates the data had been deleted in the data structure. The following diagram shows a sample of the deleted_at data.

The following code is the Gremlin query:

g.V().hasLabel(“comment”).has(“deleted_at”)

However, this data structure slowed down the query while the amount of data increased. Therefore, without relying on properties, we tried to express the same query using vertices and edges. We added the delete edge to connect the comment and user vertices. The following diagram shows a sample of data after the changes.

As a result, the speed to execute the query did not slow down, even with millions of comments and tens of GB of comments. The query now looks like the following code:

g.E().hasLabel(“delete”).inV().hasLabel(“comment”)

Gremlin queries

Queries can have a significant impact on performance, depending on how you write them.
For example, in News Pass, when extracting comment data from the CommentID, we originally used the following query:

g.V().hasLabel(“comment”).hasId(“CommentID you search for”)

However, the following modified query returns the result much faster:

g.V(“CommentID to search for”).hasLabel(“comment”)

The difference is whether you filter by label or ID of vertex first. Typically, in a relational database, it is faster to filter by ID first. The graph database acted in the same way.

Summary

In graph databases, you can use vertices and edges, as well as their labels and properties, to easily represent many kinds of objects and relationships between individual objects. The same functionality is difficult to achieve with relational databases.

In our experience, we have seen that Neptune can deal with tens of thousands of comments a day while maintaining a fast response. In addition, it is easy to scale in and scale out with Neptune, and straightforward to operate.

At Gunosy, we access our database using the in-house developed Go library, but there is a variety of Neptune’s official client libraries that support multiple GraphDB versions. Therefore, it is easy and simple for developers to build applications using Neptune.

AWS Database Blog