AWS Database Blog
How Gunosy built a comment feature in News Pass using Amazon Neptune
This guest post is a translation and adaption from How to implement and operate News Pass comment feature in GraphDB using Amazon Neptune, published in Japanese by Gunosy.
Gunosy’s motto is to “Optimally deliver information to people around the world.” In their own words “Gunosy has developed and operated multiple media businesses, including the information curation service Gunosy; the news distribution application News Pass, which is provided by Japanese telecommunication provider KDDI and Gunosy jointly; and the women’s trend information application LUCRA. In addition to the media business, Gunosy also has business in ad-tech, Gunosy Ads, and the Gunosy Ad Network. The information curation service collects and distributes information sourced from the vast amount of information on the internet, filtered by specific criteria. Gunosy uses algorithms to collect and organize information to deliver the right information to the right people.”
News Pass is a free application that allows customers to check trending news easily. The application can deliver selected information from affiliated media automatically by using a unique information analysis and distribution technology. News Pass allows users to add their comments to a news article. This post discusses how Gunosy implements and operates the News Pass comment feature using Amazon Neptune.
Why did Gunosy choose Amazon Neptune?
Before we implemented the comment feature in News Pass, we considered an implementation using Amazon Aurora and Amazon DynamoDB. However, we decided to adopt Neptune because comments are primarily a graph structure.
We liked that it is easy to add features, such as adding comments to an object (for example, another comment) other than the news article itself. Another advantage is that it is easy to implement simple recommendations.
What does the graph in Amazon Neptune look like?
Neptune enables comments for articles in News Pass. The following diagram shows the data structure of the comment data.
The data structure includes the following elements:
- The edge of about connects the
article
andcomment
(A comment about an article is linked to that article.) - The edge of
post
connects thecomment
anduser
(A user posts the comment.) - The edge of
like
connects thecomment
anduser
(A user likes the comment.) - The edge of
delete
connects thecomment
anduser
(A user who posted a comment can delete it.) - The
comment
vertex holdsid
andbody
as properties. A vertex can have many properties, and an individual property can be held as an array. - For
like
,post
, anddelete
, each edge has the timestamp of when they were connected as a property. Unlike vertices, an edge can only have one single property.
In this way, the graph database uses vertices and edges, as well as their labels and properties, to easily represent many kinds of objects and relationships between individual objects.
We can also increase or decrease the number of properties on individual objects because we don’t have to define the strict schema beforehand in Neptune, unlike relational databases. For example, to implement a function to comment on a specific comment, you can extend a line from the comment vertex with about
edge and complete the connection with the target comment
vertex.
Performance optimization
With Neptune, we could achieve a response time of 40 milliseconds at the 99th percentile for the API to get comments. However, to maintain a fast response time, we needed to configure data structures and queries appropriately. The following is the explanation of the data structures and queries we optimized to maintain the fast response time in Neptune.
Data structure
For operational reasons, the operator had to regularly get comments that commenters had already deleted.
Initially, the property of the comment
vertex optionally held deleted_at
, which indicates the data had been deleted in the data structure. The following diagram shows a sample of the deleted_at
data.
The following code is the Gremlin query:
However, this data structure slowed down the query while the amount of data increased. Therefore, without relying on properties, we tried to express the same query using vertices and edges. We added the delete
edge to connect the comment
and user
vertices. The following diagram shows a sample of data after the changes.
As a result, the speed to execute the query did not slow down, even with millions of comments and tens of GB of comments. The query now looks like the following code:
Gremlin queries
Queries can have a significant impact on performance, depending on how you write them.
For example, in News Pass, when extracting comment data from the CommentID
, we originally used the following query:
However, the following modified query returns the result much faster:
The difference is whether you filter by label or ID of vertex first. Typically, in a relational database, it is faster to filter by ID first. The graph database acted in the same way.
Summary
In graph databases, you can use vertices and edges, as well as their labels and properties, to easily represent many kinds of objects and relationships between individual objects. The same functionality is difficult to achieve with relational databases.
In our experience, we have seen that Neptune can deal with tens of thousands of comments a day while maintaining a fast response. In addition, it is easy to scale in and scale out with Neptune, and straightforward to operate.
At Gunosy, we access our database using the in-house developed Go library, but there is a variety of Neptune’s official client libraries that support multiple GraphDB versions. Therefore, it is easy and simple for developers to build applications using Neptune.