AWS Database Blog

How to create a fast and globally available user profiling system by using Amazon DynamoDB global tables

A user profiling system is a system that stores users’ names, IDs, contact information, past behaviors, interests, and other information. Such a system also provides methods to query that user information. In this post, I explain the importance of a globalized user profiling system, how to create this system using Amazon DynamoDB global tables, and how to use the system with machine learning.

Overview

Websites, mobile applications, and games can have global user bases. For example, consider the recent top-trending game, PlayerUnknown’s Battlegrounds, whose user base includes the following geographic breakdown:

  • 24 percent from the United States
  • 19 percent from China
  • 6 percent from Germany

For this game and others like it, a globally available user profiling system with low latency can help provide good user experiences worldwide.

DynamoDB global tables provide fast global access to data by replicating DynamoDB tables in multiple AWS Regions. With DynamoDB global tables, you can build a global user profiling system that is fast and consistent. The user profiling system that is demonstrated in this post uses local replica tables of global tables for global low latency access, and its data is eventually consistent.

This globalized user profiling system also can be used for machine learning purposes. Because DynamoDB is schemaless, you can easily store and query data of any form. Thus, you can tailor the data format that is used in a DynamoDB-based user profiling system to be the best fit for the machine learning model that you choose. In this post, I demonstrate how to adopt a user profiling system with machine learning.

Building a global user profiling system

Let’s say that Example Corp. is a global online book retailer and has customers all over the world. Example Corp. wants to remember information such as customers’ names, IDs, addresses, and payment methods, and the company wants to use this information to fulfill customers’ orders. In this case, Example Corp. needs a user profiling system to store users’ information.

This user profiling system needs to meet two requirements. First, because Example Corp. is a global online book retailer, this user profiling system must support users all over the world. Second, this user profiling system should serve requests quickly so that customers don’t experience web delay. (The longer an order takes, the more likely the customer is to abandon the order.) To meet these two requirements, I will demonstrate how to create an example user profiling system for Example Corp. by using Amazon DynamoDB global tables. For the sake of simplicity, this example uses only user_id and user_name attributes.

To get started, you create a DynamoDB table called UserProfiles in the US West (Oregon) Region. It should have a primary key named user_id, and DynamoDB streams should be enabled. Then you create a table in the EU (Ireland) Region with the same settings. Each table serves the requests nearest to it. For example, the table in the EU (Ireland) Region serves requests from Europe, and the table in the US West (Oregon) Region serves requests from the US. I discuss the benefit of this extra table later in this section.

The following example Python code creates a UserProfiles table in the US West (Oregon) Region. To create the same table in the EU (Ireland) Region, you need to modify the region_name of the code. To learn more about creating a table, see Create a Table in the DynamoDB Developer Guide.

ddb_client = boto3.client ('dynamodb', region_name='us-west-2')
ddb_client.create_table(
        AttributeDefinitions=[
            {
                'AttributeName': 'user_id',
                'AttributeType': 'S'
            },
        ],
        TableName='UserProfiles',
        KeySchema=[
            {
                'AttributeName': 'user_id',
                'KeyType': 'HASH'
            },
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 10,
            'WriteCapacityUnits': 10
        },
        StreamSpecification={
            'StreamEnabled': True,
            'StreamViewType': 'NEW_AND_OLD_IMAGES'
        }
    )

Then create a global table based on the two tables that you just created. When you create a global table from existing tables, each table must be empty and have the same name and key settings. We also strongly suggest setting write capacity consistently across replica tables in your global table. For more information about creating global tables, see create_global_table in the Boto 3 documentation.

ddb_client.create_global_table(
    GlobalTableName='UserProfiles',
    ReplicationGroup=[
        {
            'RegionName': 'us-west-2'
        },
        {
            'RegionName': 'eu-west-1'
        }
    ]
)

Next, define a set of two operations for the user profiling system so that you can store and query user information. The put_user method enables you to store user information, and the get_user method enables you to query information based on user_id.

def put_user(user_id, user_name):
    return ddb_table.put_item(
        TableName=table_name,
        Item={
            'user_id' : user_id,
            'user_name' : user_name
        })


def get_user(user_id):
    res = ddb_table.get_item(
        TableName=table_name,
        Key={
            'user_id': user_id
        })
    return res.get('Item')

At this point, the user profiling system based on global tables is ready to use!

Now, you can create two user profiles and put them in the user profiling system via US West (Oregon) Region endpoints by using the put_user method that you just wrote. These user profiles are replicated automatically to the table in the EU (Ireland) Region by global tables. For more information about replication and global tables, see Global Tables: How It Works.

{'user_id': '4962', 'user_name': 'user one'}
{'user_id': '8291', 'user_name': 'user two'}

Let’s assume Example Corp. has one web server in the United States and one in Europe. When customers connect to the Example Corp. website in the United States, the web server queries user profiles from the user profiling system via DynamoDB US West (Oregon) Region endpoints. However, if customers visit the Example Corp. website in Europe, instead of making cross-continent requests to the United States, the website retrieves users’ profiles from the user profiling system via EU (Ireland) Region endpoints (see the following diagram). When compared to querying information from another continent, querying from a local replication saves more than 200 milliseconds per request, based on my experience. Also, each local replication is eventually consistent, so you don’t need to worry about the synchronization of user profiles across local replicas.Overview of the global user profiling system

How to apply your recommendation system with DynamoDB

With its user profiling system up and running, Example Corp. wants to use the system to show targeted book recommendations to its customers. For simplicity, let’s assume Example Corp. already has a machine learning recommendation system that generates recommendations based on customers’ features. Features can be constructed from information such as customers’ purchase history, browsing history, and product ratings.

For Example Corp. to put its recommendation system into production, its user profiling system must also be able to store sets of customers’ features. Because DynamoDB is schemaless, you can easily store and query data of any form. Thus, you can add a new attribute to this DynamoDB-based user profiling system to store customers’ features to generate recommendations.

The following code example shows the code change that is needed to add this new attribute. Note that in addition to the name and ID of the user, I added a new attribute called user_features. You can store JSON objects as the format of this attribute so that you can serialize any data format to JSON and store it in this user profiling system.

This example uses a Numpy Python library array object, numpy.array, as the data format for customer features. The updated put_user method takes a user_feature variable and writes to DynamoDB tables. The to_json method can be used to serialize numpy.array to JSON. The from_json method can be used to deserialize JSON back to numpy.array.

def put_user(user_id, user_name, user_features):
    return ddb_table.put_item(
        TableName=table_name,
        Item={
            'user_id' : user_id,
            'user_name' : user_name,
            'user_features' : user_features
        })

def to_json(ma):
    return json.dumps(ma.tolist());

def from_json(js):
    return np.array(json.loads(js))

Now, when a customer browses Example Corp.’s online store, the user profiling service can quickly provide the customer’s feature set, run the feature set through the recommendation system, and generate recommendations for the customer.

Conclusion

In this post, I demonstrated how to create a user profiling system based on DynamoDB global tables and discussed the fast and consistent global access provided by global tables. I also shared an example that combines this user profiling system with machine learning. In addition to serving as an input data source for machine learning models in production, a user profiling system can be used as a data store for training and testing datasets for building machine-learning models.


About the Authors

Jia Zhang is a software development engineer at Amazon Web Services.