AWS Machine Learning Blog
Amazon Personalize can now use 10X more item attributes to improve relevance of recommendations
January 2023: This blog post was reviewed and updated by Brian Soper and Rob Percival, with new steps and code along with the option to use AWS CloudShell to run the procedure.
Amazon Personalize is a machine learning service which enables you to personalize your website, app, ads, emails, and more, with custom machine learning models which can be created in Amazon Personalize, with no prior machine learning experience. AWS is pleased to announce that Amazon Personalize now supports ten times more item attributes for modeling in Personalize. Previously, you could use up to five item attributes while building an ML model in Amazon Personalize. This limit is now 50 attributes. You can now use more information about your items, for example, category, brand, price, duration, size, author, year of release etc., to increase the relevance of recommendations.
In this post, you learn how to add item metadata with custom attributes to Amazon Personalize and create a model using this data and user interactions. This post uses the Amazon customer reviews data for beauty products. For more information and to download this data, see Amazon Customer Reviews Dataset. We will use the history of what items the users have reviewed along with user and item metadata to generate product recommendations for them.
Pre-processing the data
To model the data in Amazon Personalize, you need to break it into the following datasets:
- Users – Contains metadata about the users
- Items – Contains metadata about the items
- Interactions – Contains interactions (for this post, reviews) and metadata about the interactions
For each respective dataset, this post uses the following attributes:
- Users –
customer_id
,helpful_votes
, andtotal_votes
- Items –
product_id
,product_category
, andproduct_parent
- Interactions –
product_id
,customer_id
,review_date
, andstar_rating
This post does not use the other attributes available, which include marketplace
, review_id
, product_title
, vine
, verified_purchase
, review_headline
, and review_body
.
Additionally, to conform with the keywords in Amazon Personalize, this post renames customer_id
to USER_ID
, product_id
to ITEM_ID
, and review_date
to TIMESTAMP
.
To make getting started easier, you can use AWS CloudShell to experiment with this procedure. To do this choose a region using the AWS Regional Services List that supports both AWS CloudShell and Amazon Personalize. If you are not using CloudShell, be sure your environment includes the AWS CLI.
To download and process the data for input to Amazon Personalize, use the following example code blocks. The Python code blocks assume Python3 will be used.
For the Users dataset, enter the following code:
The following screenshot shows the Users dataset. This output can be generated by
Delete the User dataset dataframe to free up memory by running del [df]
.
For the Items dataset, enter the following code:
The following screenshot shows the Items dataset. This output can be generated by
Delete the Items dataset dataframe to free up memory by running del [df1]
.
For the Interactions dataset, enter the following code:
The following screenshot shows the Interactions dataset. This output can be generated by
If using interactive mode, quit python3 and return to the bash shell by running quit()
.
Uploading the data
Note that if your session to CloudShell is lost at any point in the procedure, work can resume by pulling previously set variables from persistent file by running the Bash command “source ~/local_variables.txt”
Also note that CloudShell is a regional instance, so make sure you are logging back into CloudShell in the same region that you started.
After Pre-processing has been completed, upload the data to your Amazon S3 bucket. Be sure to replace <your_bucket_name_here> with a globally unique S3 bucket name while observing S3 bucket naming rules.
Ingesting the data
After you process the preceding data, you can ingest it in Amazon Personalize.
Creating a dataset group
To create a dataset group to store events (user interactions) sent by your application and the metadata for users and items, complete the following commands:
Creating a dataset and defining schema
After you create the dataset group, create a dataset and define schema for each of them. The following commands are for your three datasets:
Create schemas for Items, Users, and Interactions:
Create the datasets for Items, Users, and Interactions:
Importing the data
After you create the dataset, import the data from Amazon S3. To import your Items data, complete the following commands.
Set up policies and roles to allow S3 and Personalize interactions:
Create dataset import jobs:
Check status of the dataset import jobs. This may take several minutes.
Training a model
After you ingest the data into Amazon Personalize, you are ready to train a model (solutionVersion
). To do so, map the recipe (algorithm) you want to use to your use case. The following are your available options:
- For user personalization, such as recommending items to a user, use one of the recipes described in the user personalization recipes documentation pages.
- For recommending items similar to an input item, use SIMS.
- For reranking a list of input items for a given user, use Personalized-Ranking.
This post uses the User-Personalization recipe to define a solution and then train a solutionVersion
(model). Complete the following commands.
You can also change the default hyperparameters or perform hyperparameter optimization for a solution.
Check status of the solution version. This may take an hour or longer as it is running full training on the datasets.
Getting recommendations
To get recommendations, create a campaign using the solution and solution version you just created. Complete the following steps:
Check status of the campaign. This may take several minutes.
After you set up the campaign, you can programmatically call the campaign to get recommendations in form of item IDs. You can also use the console to get the recommendations and perform spot checks. Additionally, Amazon Personalize offers the ability to batch process recommendations. For more information, see Now available: Batch Recommendations in Amazon Personalize.
One way to test the campaign is with the following commands that will test both an existing a nonexistent user.
You should see the top five ranked item IDs for this user in descending order.
Removal of Created Resources
If you would like to remove the resources that you created in this post, run the following commands:
Conclusion
You can now use these recommendations to power display experiences, such as personalize the homepage of your beauty website based on what you know about the user or send a promotional email with recommendations. Performing real-time recommendations with Amazon Personalize requires you to also send user events as they occur. For more information, see Amazon Personalize is Now Generally Available. Get started with Amazon Personalize today!
About the author
Vaibhav Sethi is the Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys hiking and reading.
Brian Soper is a Solutions Architect at Amazon Web Services helping AWS customers transform and architect for the cloud since 2018. Brian has a 20+ year background building out physical and virtual infrastructure for both on-premises and cloud.
Rob Percival is an Account Manager in the AWS Games organization. He works with operators, game developers, and software providers in the US Real Money Gaming (online sports betting and casino gambling) industry to increase speed to market, gain deeper insight on their players, and accelerate experimentation and innovation using AWS.