In this module, you download your dataset, inspect the dataset, then create the dataset group and schema you use in this tutorial.
Time to Complete Module: 20 Minutes
Amazon Personalize datasets are containers for data. A dataset group is a collection of related datasets (Interactions, Users, and Items). There are three types of datasets in Amazon Personalize:
- Interactions: This dataset stores historical and real-time data from interactions between users and items. This data can include impressions data and contextual metadata on your users’ browsing context, such as their location or device (mobile, tablet, desktop, and so on). You must at minimum create an Interactions dataset.
- Users: This dataset stores metadata about your users. This might include information such as age, gender, or loyalty membership which can be important in personalization systems.
- Items: This dataset stores metadata about your items. This might include information such as price, SKU type, or availability.
In this tutorial, you only use Interactions data. For the advanced use of other types of datasets, see Datasets and Schemas.
Your Amazon Personalize model will be trained on the MovieLens Latest Small dataset that contains 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. The MovieLens dataset is curated by GroupLens Research.
In this module, you imported and fetched the dataset you use for your movie title recommendation system. Then, you prepared the dataset by splitting the dataset based on movie ratings. Finally, you created the dataset group, schema, and interactions dataset that you use to train your Amazon Personalize model.
In the next module, you create your Amazon S3 bucket that stores the interaction data, and configure the S3 bucket to allow Amazon Personalize access to the data.