Create real-time, personalized movie recommendations with Amazon Personalize

Create campaign and get recommendations

Configure your Amazon S3 bucket and import the data

In this module, you create an Amazon S3 bucket to stage your interactions dataset. To ensure that Amazon Personalize can access and work with the data, you must also grant permissions using IAM roles and policies.

Time to Complete Module: 20 Minutes

Step 1. Create Amazon S3 bucket and upload data
Step 1. Create Amazon S3 bucket and upload data
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Training a model produces model training data and model artifacts. In this lab, you use an Amazon S3 bucket to stage the interactions dataset, and store the model artifacts generated by Amazon Personalize during model training.

In your Jupyter notebook, copy and paste the following code into a new code cell and choose Run.

session = boto3.session.Session() region = session.region_name s3 = boto3.client('s3') account_id = boto3.client('sts').get_caller_identity().get('Account') bucket_name = account_id + "-" + region + "-" + "personalizedemoml" print(bucket_name) if region == "us-east-1": s3.create_bucket(Bucket=bucket_name) else: s3.create_bucket( Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': region} )

This script creates an Amazon S3 bucket with a name [account id]-[region]-personalizedemoml.

Note: If you encounter an error, this is may be due to an existing S3 bucket with the same name. Modify the name of the bucket and run the code again.
(Click to enlarge)
Next, upload the data. In your Jupyter notebook, copy and paste the following code into a new code cell and choose Run.

interactions_file_path = data_dir + "/" + interactions_filename boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_filename).upload_file(interactions_file_path) interactions_s3DataPath = "s3://"+bucket_name+"/"+interactions_filename
(Click to enlarge)

Step 2. Configure S3 bucket policy

In this step, you configure the Amazon S3 bucket policy so that Amazon Personalize can read the content of your S3 bucket. Run the following code block to create and attach the appropriate policy.

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:*Object",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket_name),
                "arn:aws:s3:::{}/*".format(bucket_name)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy))

You also need to provide Amazon Personalize with the ability to assume roles in AWS to have the permissions to execute certain tasks. Run the following code to create an IAM role and attach the required policies to it.

iam = boto3.client("iam")

role_name = "PersonalizeRolePOC"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)

# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = policy_arn
)

# Now add S3 support
iam.attach_role_policy(
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
    RoleName=role_name
)
time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

Step 3. Import the dataset into Amazon Personalize

Remember that you created the dataset group and dataset in Step 2. Now, you can create the import job that loads the data from Amazon S3 into Amazon Personalize to use for your model.

create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "personalize-demo-import1",
    datasetArn = interactions_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket_name, interactions_filename)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

Next, run the import job:

%%time
max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

The output reports the status of the job each minute. Wait for the DatasetImportJob status to show ACTIVE in your notebook. This step takes 10 to 15 minutes.

DatasetImportJob: CREATE PENDING
DatasetImportJob: CREATE IN_PROGRESS
...
DatasetImportJob: ACTIVE

Great! Your dataset is now imported to Amazon Personalize.

Conclusion

In this module, you created the Amazon S3 bucket to stage your dataset, created the appropriate policies and roles for Amazon Personalize to access the data, then created an import job to import the data into Amazon Personalize.

In the next module, you create an Amazon Personalize solution that you can later deploy for recommendations.

Next: Create solution

Create real-time, personalized movie recommendations with Amazon Personalize

Introduction

Background and setup

Download and prepare dataset

Import dataset

Create solution

Create campaign and get recommendations

Clean up and next steps

Configure your Amazon S3 bucket and import the data

Step 1. Create Amazon S3 bucket and upload data

Step 1. Create Amazon S3 bucket and upload data

Step 2. Configure S3 bucket policy

Step 2. Configure S3 bucket policy

Step 3. Import the dataset into Amazon Personalize

Step 3. Import the dataset into Amazon Personalize

Conclusion

Step 1. Create Amazon S3 bucket and upload data

Step 2. Configure S3 bucket policy

Step 3. Import the dataset into Amazon Personalize

Ending Support for Internet Explorer