Streaming Events from Amazon Pinpoint to Redshift

Note: This post was originally written by Ryan Idrigo-Lam, one of the founding members of the Amazon Pinpoint team.

You can use Amazon Pinpoint to segment, target, and engage with your customers directly from the console. The Pinpoint console also includes a variety of dashboards that you can use to keep track of how your customers use your applications, and measure how likely your customers are to engage with the messages you send them.

Some Pinpoint customers, however, have use cases that require a bit more than what these dashboards have to offer. For example, some customers want to join their Pinpoint data to external data sets, or to collect historical data beyond the six month window that Pinpoint retains. To help customers meet these needs, and many more, Amazon Pinpoint includes a feature called Event Streams.

This article provides information about using Event Streams to export your data from Amazon Pinpoint and into a high-performance Amazon Redshift database. Once your data is in Redshift, you can run queries against it, join it with other data sets, use it as a data source for analytics and data visualization tools, and much more.

Step 1: Create a Redshift Cluster

The first step in this process involves creating a new Redshift cluster to store your data. You can complete this step in a few clicks by using the Amazon Redshift console. For more information, see Managing Clusters Using the Console in the Amazon Redshift Cluster Management Guide.

When you create the new cluster, make a note of the values you specify for the Cluster Identifier, Database Name, Master User Name, and Master User Password. You’ll use all of these values when you set up Amazon Kinesis Firehose in the next section.

Step 2: Create a Firehose Delivery Stream with a Redshift Destination

After you create your Redshift cluster, you can create the Amazon Kinesis Data Firehose delivery stream that will deliver your Pinpoint data to the Redshift cluster.

To create the Kinesis Data Firehose delivery stream

Open the Amazon Kinesis Data Firehose console at https://console.aws.amazon.com/firehose/home.
Choose Create delivery stream.
For Delivery stream name, type a name.
Under Choose source, for Source, choose Direct PUT or other sources. Choose Next.
On the Process records page, do the following:
1. Under Transform source records with AWS Lambda, choose Enabled if you want to use a Lambda function to transform the data before Firehose loads it into Redshift. Otherwise, choose Disabled.
2. Under Convert record format, choose Disabled, and then choose Next.
On the Choose destination page, do the following:
1. For Destination, choose Amazon Redshift.
2. Under Amazon Redshift destination, specify the Cluster name, User name, Password, and Database for the Redshift database you created earlier. Also specify a name for the Table.
3. Under Intermediate S3 destination, choose an S3 bucket to store data in. Alternatively, choose Create new to create a new bucket. Choose Next.
On the Configure settings page, do the following:
1. Under IAM role, choose an IAM role that Firehose can use to access your S3 bucket and KMS key. Alternatively, you can have the Firehose console create a new role. Choose Next.
2. On the Review page, confirm the settings you specified on the previous pages. If the settings are correct, choose Create delivery stream.

Step 3: Create a JSONPaths file

The next step in this process is to create a JSONPaths file and upload it to an Amazon S3 bucket. You use the JSONPaths file to tell Amazon Redshift how to interpret the unstructured JSON that Amazon Pinpoint provides.

To create a JSONPaths file and upload it to Amazon S3

In a text editor, create a new file.

Paste the following code into the text file:

{
  "jsonpaths": [
    "$['event_type']",
    "$['event_timestamp']",
    "$['arrival_timestamp']",
    "$['event_version']",
    "$['application']['app_id']",
    "$['application']['package_name']",
    "$['application']['version_name']",
    "$['application']['version_code']",
    "$['application']['title']",
    "$['application']['cognito_identity_pool_id']",
    "$['application']['sdk']['name']",
    "$['application']['sdk']['version']",
    "$['client']['client_id']",
    "$['client']['cognito_id']",
    "$['device']['model']",
    "$['device']['make']",
    "$['device']['platform']['name']",
    "$['device']['platform']['version']",
    "$['device']['locale']['code']",
    "$['device']['locale']['language']",
    "$['device']['locale']['country']",
    "$['session']['session_id']",
    "$['session']['start_timestamp']",
    "$['session']['stop_timestamp']",
    "$['monetization']['transaction']['transaction_id']",
    "$['monetization']['transaction']['store']",
    "$['monetization']['transaction']['item_id']",
    "$['monetization']['transaction']['quantity']",
    "$['monetization']['transaction']['price']['reported_price']",
    "$['monetization']['transaction']['price']['amount']",
    "$['monetization']['transaction']['price']['currency']['code']",
    "$['monetization']['transaction']['price']['currency']['symbol']",
    "$['attributes']['campaign_id']",
    "$['attributes']['campaign_activity_id']",
    "$['attributes']['my_custom_attribute']",
    "$['metrics']['my_custom_metric']"
  ]
}

Modify the preceding code example to include the fields that you want to import into Redshift.
Note: You can specify custom attributes or metrics by replacing my_custom_attribute or my_custom_metric in the example above with your custom attributes or metrics, respectively.
When you finish modifying the code example, remove all whitespace, including spaces and line breaks, from the file. Save the file as json-paths.json.
Open the Amazon S3 console at https://s3.console.aws.amazon.com/s3/home.
Choose the S3 bucket you created when you set up the Firehose stream. Upload json-paths.json into the bucket.

Step 4: Configure the table in Redshift

At this point, it’s time to finish setting up your Redshift database. In this section, you’ll create a table in the Redshift cluster you created earlier. The columns in this table mirror the values you specified in the JSONPaths file in the previous section.

Connect to your Redshift cluster by using a database tool such as SQL Workbench/J. For more information about connecting to a cluster, see Connect to the Cluster in the Amazon Redshift Getting Started Guide.

Create a new table that contains a column for each field in the JSONPaths file you created in the preceding section. You can use the following example as a template.

CREATE schema AWSMA;
CREATE TABLE AWSMA.event(
  event_type VARCHAR(256) NOT NULL ENCODE LZO,
  event_timestamp TIMESTAMP NOT NULL ENCODE LZO,
  arrival_timestamp TIMESTAMP NULL ENCODE LZO,
  event_version CHAR(12) NULL ENCODE LZO,
  application_app_id VARCHAR(64) NOT NULL ENCODE LZO,
  application_package_name VARCHAR(256) NULL ENCODE LZO,
  application_version_name VARCHAR(256) NULL ENCODE LZO,
  application_version_code VARCHAR(256) NULL ENCODE LZO,
  application_title VARCHAR(256) NULL ENCODE LZO,
  application_cognito_identity_pool_id VARCHAR(64) NULL ENCODE LZO,
  application_sdk_name VARCHAR(256) NULL ENCODE LZO,
  application_sdk_version VARCHAR(256) NULL ENCODE LZO,
  client_id VARCHAR(64) NULL DISTKEY ENCODE LZO,
  client_cognito_id VARCHAR(64) NULL ENCODE LZO,
  device_model VARCHAR(256) NULL ENCODE LZO,
  device_make VARCHAR(256) NULL ENCODE LZO,
  device_platform_name VARCHAR(256) NULL ENCODE LZO,
  device_platform_version VARCHAR(256) NULL ENCODE LZO,
  device_locale_code VARCHAR(256) NULL ENCODE LZO,
  device_locale_language VARCHAR(64) NULL ENCODE LZO,
  device_locale_country VARCHAR(64) NULL ENCODE LZO,
  session_id VARCHAR(64) NULL ENCODE LZO,
  session_start_timestamp TIMESTAMP NULL ENCODE LZO,
  session_stop_timestamp TIMESTAMP NULL ENCODE LZO,
  monetization_transaction_id VARCHAR(64) NULL ENCODE LZO,
  monetization_transaction_store VARCHAR(64) NULL ENCODE LZO,
  monetization_transaction_item_id VARCHAR(64) NULL ENCODE LZO,
  monetization_transaction_quantity FLOAT8 NULL,
  monetization_transaction_price_reported VARCHAR(64) NULL ENCODE LZO,
  monetization_transaction_price_amount FLOAT8 NULL,
  monetization_transaction_price_currency_code VARCHAR(16) NULL ENCODE LZO,
  monetization_transaction_price_currency_symbol VARCHAR(32) NULL ENCODE LZO,
  - Custom Attributes
  a_campaign_id VARCHAR(4000),
  a_campaign_activity_id VARCHAR(4000),
  a_my_custom_attribute VARCHAR(4000),
  - Custom Metrics
  m_my_custom_metric float8
)
SORTKEY ( application_app_id, event_timestamp, event_type);

Step 5: Configure the Firehose Stream

You’re getting close! At this point, you’re ready to point the Kinesis Data Firehose stream to your JSONPaths file so that Redshift parses the incoming data properly. You also need to list the columns of the table that your data will be copied into.

To configure the Firehose Stream

Open the Amazon Kinesis Data Firehose console at https://console.aws.amazon.com/firehose/home.
In the list of delivery streams, choose the delivery stream you created earlier.
On the Details tab, choose Edit.

Under Amazon Redshift destination, for COPY options, paste the following:

JSON 's3://s3-bucket/json-paths.json'
TRUNCATECOLUMNS
TIMEFORMAT 'epochmillisecs'

Replace s3-bucket in the preceding code example with the path to the S3 bucket that contains json-paths.json.
For Columns, list all of the columns that are present in the JSONPaths file you created earlier. Specify the column names in the same order as they’re listed in the json-paths.json file, using commas to separate the column names. When you finish, choose Save.

Step 6: Enable Event Streams in Amazon Pinpoint

The only thing left to do now is to tell Amazon Pinpoint to start sending data to Amazon Kinesis.

To enable Event Streaming in Amazon Pinpoint

Open the Amazon Pinpoint console at https://console.aws.amazon.com/pinpoint/home.
Choose the application or project that you want to enable event streams for.
In the navigation pane, choose Settings.
On the Event stream tab, choose Enable streaming of events to Amazon Kinesis.
Under Stream to Amazon Kinesis, select Send events to an Amazon Kinesis Firehose delivery stream.
For Amazon Kinesis Firehose delivery stream, choose the stream you created earlier.
For IAM role, choose an existing role that allows the firehose:PutRecordBatch action, or choose Automatically create a role to have Amazon Pinpoint create a role with the appropriate permissions. If you choose to have Amazon Pinpoint create a role for you, type a name for the role. Choose Save.

That’s it! Once you complete this final step, Amazon Pinpoint starts exporting the data you specified into your Redshift cluster.

I hope this walk through was helpful. If you have any questions, please let us know in the comments or in the Amazon Pinpoint forum.

AWS Messaging Blog

Streaming Events from Amazon Pinpoint to Redshift

Step 1: Create a Redshift Cluster

Step 2: Create a Firehose Delivery Stream with a Redshift Destination

Step 3: Create a JSONPaths file

Step 4: Configure the table in Redshift

Step 5: Configure the Firehose Stream

Step 6: Enable Event Streams in Amazon Pinpoint

Resources

Learn

Resources

Developers

Help