Creating custom Pinpoint dashboards using Amazon QuickSight, part 3

Note: This post was written by Manan Nayar and Aprajita Arora, Software Development Engineers on the AWS Digital User Engagement team.

This is the third and final post in our series about creating custom visualizations of your Amazon Pinpoint metrics using Amazon QuickSight.

In our first post, we used the Metrics APIs to retrieve specific Key Performance Indicators (KPIs), and then created visualizations using QuickSight. In the second post, we used the event stream feature in Amazon Pinpoint to enable more in-depth analyses.

The examples in the first two posts used Amazon S3 to store the metrics that we retrieved from Amazon Pinpoint. This post takes a different approach, using Amazon Redshift to store the data. By using Redshift to store this data, you gain the ability to create visualizations on large data sets. This example is useful in situations where you have a large volume of event data, and situations where you need to store your data for long periods of time.

Step 1: Provision the storage

The first step in setting up this solution is to create the destinations where you’ll store the Amazon Pinpoint event data. Since we’ll be storing the data in Amazon Redshift, we need to create a new Redshift cluster. We’ll also create an S3 bucket, which will house the original event data that’s streamed from Amazon Pinpoint.

To create the Redshift cluster and the S3 bucket

Create a new Redshift cluster. To learn more, see the Amazon Redshift Getting Started Guide.

Create a new table in the Redshift cluster that contains the appropriate columns. Use the following query to create the table:

create table if not exists pinpoint_events_table(
  rowid varchar(255) not null,
  project_key varchar(100) not null,
  event_type varchar(100) not null,
  event_timestamp timestamp not null,
  campaign_id varchar(100),
  campaign_activity_id varchar(100),
  treatment_id varchar(100),
  PRIMARY KEY (rowid)
);

Create a new Amazon S3 bucket. For complete procedures, see Create a Bucket in the Amazon S3 Getting Started Guide.

Step 2: Set up the event stream

This example uses the event stream feature of Amazon Pinpoint to send event data to S3. Later, we’ll create a Lambda function that sends the event data to your Redshift cluster when new event data is added to the S3 bucket. This method lets us store the original event data in S3, and transfer a subset of that data to Redshift for analysis.

To set up the event stream

Sign in to the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint. In the list of projects, choose the project that you want to enable event streaming for.
Under Settings, choose Event stream.
Choose Edit, and then configure the event stream to use Amazon Kinesis Data Firehose. If you don’t already have a Kinesis Data Firehose stream, follow the link to create one in the Kinesis console. Configure the stream to send data to an S3 bucket. For more information about creating streams, see Creating an Amazon Kinesis Data Firehose Delivery Stream.
Under IAM role, choose Automatically create a role. Choose Save.

Step 3: Create the Lambda function

In this section, you create a Lambda function that processes the raw event stream data, and then writes it to a table in your Redshift cluster.
To create the Lambda function:

Download the psycopg2 binary from https://github.com/jkehler/awslambda-psycopg2. This Python library lets you interact with PostgreSQL databases, such as Amazon Redshift. It contains certain libraries that aren’t included in Lambda.
- Note: This Github repository is not an official AWS-managed repository.
Within the awslambda-psycopg2-master folder, you’ll find a folder called psycopg2-37. Rename the folder to psycopg2 (you may need to delete the existing folder with that name), and then compress the entire folder to a .zip file.
Create a new Lambda function from scratch, using the Python 3.7 runtime.
Upload the psycopg2.zip file that you created in step 1 to Lambda.

In Lambda, create a new function called lambda_function.py. Paste the following code into the function:

import datetime
import json
import re
import uuid
import os
import boto3
import psycopg2
from psycopg2 import Error

cluster_redshift = "<clustername>"
dbname_redshift = "<dbname>"
user_redshift = "<username>"
password_redshift = "<password>"
endpoint_redshift = "<endpoint>"
port_redshift = "5439"
table_redshift = "pinpoint_events_table"

# Get the file that contains the event data from the appropriate S3 bucket.
def get_file_from_s3(bucket, key):
    s3 = boto3.client('s3')
    obj = s3.get_object(Bucket=bucket, Key=key)
    text = obj["Body"].read().decode()

    return text

# If the object that we retrieve contains newline-delineated JSON, split it into
# multiple objects.
def clean_and_split(json_raw):
    json_delimited = re.sub('}\s{','}---X-DELIMITER---{',json_raw)
    json_clean = re.sub('\s+','',json_delimited)
    data = json_clean.split("---X-DELIMITER---")

    return data

# Set all of the variables that we'll use to create the new row in Redshift.
def set_variables(in_json):

    for line in in_json:
        content = json.loads(line)
        app_id = content['application']['app_id']
        event_type = content['event_type']
        event_timestamp = datetime.datetime.fromtimestamp(content['event_timestamp'] / 1e3).strftime('%Y-%m-%d %H:%M:%S')

        if (content['attributes'].get('campaign_id') is None):
            campaign_id = ""
        else:
            campaign_id = content['attributes']['campaign_id']

        if (content['attributes'].get('campaign_activity_id') is None):
            campaign_activity_id = ""
        else:
            campaign_activity_id = content['attributes']['campaign_activity_id']

        if (content['attributes'].get('treatment_id') is None):
            treatment_id = ""
        else:
            treatment_id = content['attributes']['treatment_id']

        write_to_redshift(app_id, event_type, event_timestamp, campaign_id, campaign_activity_id, treatment_id)
            
# Write the event stream data to the Redshift table.
def write_to_redshift(app_id, event_type, event_timestamp, campaign_id, campaign_activity_id, treatment_id):
    row_id = str(uuid.uuid4())

    query = ("INSERT INTO " + table_redshift + "(rowid, project_key, event_type, "
            + "event_timestamp, campaign_id, campaign_activity_id, treatment_id) "
            + "VALUES ('" + row_id + "', '"
            + app_id + "', '"
            + event_type + "', '"
            + event_timestamp + "', '"
            + campaign_id + "', '"
            + campaign_activity_id + "', '"
            + treatment_id + "');")

    try:
        conn = psycopg2.connect(user = user_redshift,
                                password = password_redshift,
                                host = endpoint_redshift,
                                port = port_redshift,
                                database = dbname_redshift)

        cur = conn.cursor()
        cur.execute(query)
        conn.commit()
        print("Updated table.")

    except (Exception, psycopg2.DatabaseError) as error :
        print("Database error: ", error)
    finally:
        if (conn):
            cur.close()
            conn.close()
            print("Connection closed.")

# Handle the event notification that we receive when a new item is sent to the 
# S3 bucket.
def lambda_handler(event,context):
    print("Received event: \n" + str(event))

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    data = get_file_from_s3(bucket, key)

    in_json = clean_and_split(data)

    set_variables(in_json)

In the preceding code, make the following changes:

Replace <clustername> with the name of the cluster.
Replace <dbname> with the name of the database.
Replace <username> with the user name that you specified when you created the Redshift cluster.
Replace <password> with the password that you specified when you created the Redshift cluster.
Replace <endpoint> with the endpoint address of the Redshift cluster.

In IAM, update the execution role that’s associated with the Lambda function to include the GetObject permission for the S3 bucket that contains the event data. For more information, see Editing IAM Policies in the AWS IAM User Guide.

Step 4: Set up notifications on the S3 bucket

Now that we’ve created the Lambda function, we’ll set up a notification on the S3 bucket. In this case, the notification will refer to the Lambda function that we created in the previous section. Every time a new file is added to the bucket, the notification will cause the Lambda function to run.

To create the event notification

In S3, create a new bucket notification. The notification should be triggered when PUT events occur, and should trigger the Lambda function that you created in the previous section. For more information about creating notifications, see Configuring Amazon S3 Event Notifications in the Amazon S3 Developer Guide.
Test the event notification by sending a test campaign. If you send an email campaign, your Redshift database should contain events such as _campaign.send, _email.send, _email.delivered, and others. You can check the contents of the Redshift table by running the following query in the Query Editor in the Redshift console:
```
select * from pinpoint_events_table;
```

Step 5: Add the data set in Amazon QuickSight

If your Lambda function is sending event data to Redshift as expected, you can use your Redshift database to create a new data set in Amazon QuickSight. QuickSight includes an automatic database discovery feature that helps you add your Redshift database as a data set with only a few clicks. For more information, see Creating a Data Set from a Database in the Amazon QuickSight User Guide.

Step 6: Create your visualizations

Now that QuickSight is retrieving information from your Redshift database, you can use that data to create visualizations. To learn more about creating visualizations in QuickSight, see Creating an Analysis in the Amazon QuickSight User Guide.

This brings us to the end of our series. While these posts focused on using Amazon QuickSight to visualize your analytics data, you can also use these same techniques to create visualizations using 3rd party applications. We hope you enjoyed this series, and we can’t wait to see what you build using these examples!