AWS Mobile Blog

Score big with Amazon Mobile Analytics and Amazon Redshift

by Nicholas Beets | on | Permalink | Comments |  Share

In today’s blog post I will show you some tools that make it easy for you to analyze your app user engagement, retention, user behavior, top item purchases, and other useful metrics. I will provide step-by-step instructions as well as sample SQL queries to get you started. 

The tools that I’ll describe are Amazon Mobile Analytics Auto Export and Amazon Redshift. For those who are not familiar with these services, Amazon Mobile Analytics is a service that lets you simply and cost effectively collect and analyze your application usage data. In addition to providing usage summary charts that are available for quick reference, Amazon Mobile Analytics enables you to set up automatic export of your data to Amazon S3 for use with other data analytics tools such as Amazon Redshift. With your mobile analytics events (including custom attributes and metrics) at your fingertips in a high performing data warehouse such as Amazon Redshift, you can reveal detailed insights into the usage of your apps by running queries directly or by using third-party visualization tools such as Tableau or other Amazon Redshift Partners.

Your Mobile App --> Amazon Mobile Analytics -> Amazon S3 Bucket --> Amazon Redshift Cluster <-- Amazon Redshft Client

Once you have your data in Redshift, you can dive deep to learn more about your business. For example, you can analyze your data to:

  • Get engagement, retention, and user behavior profiles across custom segments that you define.
  • List the top item purchases by app store for a specific user segment.
  • See how users navigate your app (custom funnels).

After completing the following steps, you will be able to run queries on your events in Amazon Redshift and keep your Redshift tables up to date with new events. Here’s a summary of the key steps:

  1. Integrate Amazon Mobile Analytics into your app.
  2. Turn on the Auto Export to Amazon S3 feature of Amazon Mobile Analytics.
  3. Create and set up a new Amazon Redshift cluster (or use an existing one).
  4. Create an AWS Identity and Access Management (IAM) user to access events from your Amazon S3 bucket.
  5. Configure a “jsonpaths” file to map event details to columns in an Amazon Redshift table.
  6. Instruct Amazon Redshift to load the events from Amazon S3 using the “jsonpaths” file and IAM credentials.
  7. Query your data!

Step 1: Integrate Amazon Mobile Analytics into your app

Our documentation shows you how to get started using Amazon Mobile Analytics with your app. Simply add the AWS Mobile SDK in your iOS and Android/Fire OS apps, or use the Amazon Mobile Analytics REST API, and access your Amazon Mobile Analytics dashboard via the AWS console to learn about how your app is being used

Step 2: Export data to Amazon S3 using Amazon Mobile Analytics Auto Export

Once you have integrated your app with Amazon Mobile Analytics and it’s sending events, you will want to turn on Auto Export to Amazon S3. Follow these instructions to accomplish this step. Remember the name of the Amazon S3 bucket that you are using; you will need it in the following steps.

Step 3: Creating an Amazon Redshift cluster

If you don’t have an Amazon Redshift cluster already, then you can create one (Be sure to view the pricing guides, and learn about free tier options). You can start with a smaller Amazon Redshift cluster and then scale up if necessary by changing the number of nodes and type of the cluster later. You can also recreate your Amazon Redshift cluster and reload all of your events at any time in the future. Once you have an Amazon Redshift cluster up and running, install SQL Workbench or another SQL client that can used to run queries on Amazon Redshift.

Step 4: Create an AWS IAM user to access your Amazon S3 bucket

  1. Create an AWS IAM user and generate access keys for that user by visiting the “Identity and Access Management” section of the AWS console. See our documentation for details on how to do this.
  2. Download the access key and secret key for this user.
  3. Use the following template to attach a user policy to the user created in part 1. This policy allows for read access to your Amazon S3 bucket (to be used by Amazon Redshift to read events). Change <YOUR-BUCKET-NAME> to the name of the Amazon S3 bucket that you are exporting your events to.
{
  "Statement": [
    {
      "Resource": [
        "arn:aws:s3:::<YOUR-BUCKET-NAME>*"
      ],
      "Action": [
        "s3:ListBucket", "s3:GetObject*"
      ],
      "Effect": "Allow"
    }
  ],
  "Version": "2012-10-17"
}

 

Step 5: Create tables and mapping files (using SQL workbench)

  1. Create a folder in your Amazon S3 bucket called “jsonpaths”.
  2. Place the following text into a file called eventmapping.json and upload it to:
    <YOUR BUCKET NAME>/jsonpaths/eventmapping.json
  3. Place the names of any custom metrics and attributes at the end (update $[‘attributes’][‘{custom attribute name}’], and $[‘metrics’][‘{custom metric name}’] at the end). 
{
  "jsonpaths": [
    "$['event_type']", 
    "$['event_timestamp']", 
    "$['arrival_timestamp']", 
    "$['event_version']", 
    "$['application']['app_id']", 
    "$['application']['package_name']", 
    "$['application']['version_name']", 
    "$['application']['version_code']", 
    "$['application']['title']", 
    "$['application']['cognito_identity_pool_id']", 
    "$['application']['sdk']['name']", 
    "$['application']['sdk']['version']", 
    "$['client']['client_id']", 
    "$['client']['cognito_id']", 
    "$['device']['model']", 
    "$['device']['make']", 
    "$['device']['platform']['name']", 
    "$['device']['platform']['version']", 
    "$['device']['locale']['code']", 
    "$['device']['locale']['language']", 
    "$['device']['locale']['country']", 
    "$['session']['session_id']", 
    "$['session']['start_timestamp']", 
    "$['session']['stop_timestamp']", 
    "$['monetization']['transaction']['transaction_id']", 
    "$['monetization']['transaction']['store']", 
    "$['monetization']['transaction']['item_id']", 
    "$['monetization']['transaction']['quantity']", 
    "$['monetization']['transaction']['price']['reported_price']", 
    "$['monetization']['transaction']['price']['amount']", 
    "$['monetization']['transaction']['price']['currency']['code']", 
    "$['monetization']['transaction']['price']['currency']['symbol']", 
    "$['attributes']['class']", 
    "$['attributes']['level']", 
    "$['attributes']['name']", 
    "$['attributes']['paying customer']", 
    "$['metrics']['score']", 
    "$['metrics']['time played']", 
    "$['metrics']['total spent']"
  ]
}

Create tables to stage and store your events, and a view to access them. Modify and run the following script to create tables and a view in Amazon Redshift.

  1. Modify the columns that start with a_ and m_: Specify custom attributes and custom metrics that you want to include in your Amazon Redshift tables. These columns will be mapped to the attributes and metrics that were specified in the jsonpaths file above.
  2. We recommend prefixing custom attributes with "a_" and custom metrics with "m_" so that they are easily recognizable and don’t introduce naming conflicts with other columns.
CREATE schema AWSMA;
 
--Create the table that is used to stage events temporarily that are loaded from Amazon S3
CREATE TABLE AWSMA.event_staging(
   event_type VARCHAR(256) NOT NULL ENCODE LZO,
   event_timestamp TIMESTAMP NOT NULL ENCODE LZO,
   arrival_timestamp TIMESTAMP NOT NULL ENCODE LZO,
   event_version CHAR(12) NULL ENCODE LZO,
   application_app_id VARCHAR(64) NOT NULL ENCODE LZO,
   application_package_name VARCHAR(256) NULL ENCODE LZO,
   application_version_name VARCHAR(256) NULL ENCODE LZO,
   application_version_code VARCHAR(256) NULL ENCODE LZO,
   application_title VARCHAR(256) NULL ENCODE LZO,
   application_cognito_identity_pool_id VARCHAR(64) NULL ENCODE LZO,
   application_sdk_name VARCHAR(256) NULL ENCODE LZO,
   application_sdk_version VARCHAR(256) NULL ENCODE LZO,
   client_id VARCHAR(64) NOT NULL DISTKEY ENCODE LZO,
   client_cognito_id VARCHAR(64) NULL ENCODE LZO,
   device_model VARCHAR(256) NULL ENCODE LZO,
   device_make VARCHAR(256) NULL ENCODE LZO,
   device_platform_name VARCHAR(256) NULL ENCODE LZO,
   device_platform_version VARCHAR(256) NULL ENCODE LZO,
   device_locale_code VARCHAR(256) NULL ENCODE LZO,
   device_locale_language VARCHAR(64) NULL ENCODE LZO,
   device_locale_country VARCHAR(64) NULL ENCODE LZO,
   session_id VARCHAR(64) NULL ENCODE LZO,
   session_start_timestamp TIMESTAMP NULL ENCODE LZO,
   session_stop_timestamp TIMESTAMP NULL ENCODE LZO,
   monetization_transaction_id VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_store VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_item_id VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_quantity FLOAT8 NULL,
   monetization_transaction_price_reported VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_price_amount FLOAT8 NULL,
   monetization_transaction_price_currency_code VARCHAR(16) NULL ENCODE LZO,
   monetization_transaction_price_currency_symbol VARCHAR(32) NULL ENCODE LZO,
   a_class             VARCHAR(4000),
   a_level             VARCHAR(4000),
   a_name              VARCHAR(4000),
   "a_paying customer" VARCHAR(4000),
   m_score             float8,
   "m_time played"     float8,
   "m_total spent"     float8
)
SORTKEY ( application_app_id, event_timestamp, event_type);
 
--Create the table used to store all of the events
CREATE TABLE AWSMA.event(
   event_type VARCHAR(256) NOT NULL ENCODE LZO,
   event_timestamp TIMESTAMP NOT NULL ENCODE LZO,
   arrival_timestamp TIMESTAMP NOT NULL ENCODE LZO,
   event_version CHAR(12) NULL ENCODE LZO,
   application_app_id VARCHAR(64) NOT NULL ENCODE LZO,
   application_package_name VARCHAR(256) NULL ENCODE LZO,
   application_version_name VARCHAR(256) NULL ENCODE LZO,
   application_version_code VARCHAR(256) NULL ENCODE LZO,
   application_title VARCHAR(256) NULL ENCODE LZO,
   application_cognito_identity_pool_id VARCHAR(64) NULL ENCODE LZO,
   application_sdk_name VARCHAR(256) NULL ENCODE LZO,
   application_sdk_version VARCHAR(256) NULL ENCODE LZO,
   client_id VARCHAR(64) NOT NULL DISTKEY ENCODE LZO,
   client_cognito_id VARCHAR(64) NULL ENCODE LZO,
   device_model VARCHAR(256) NULL ENCODE LZO,
   device_make VARCHAR(256) NULL ENCODE LZO,
   device_platform_name VARCHAR(256) NULL ENCODE LZO,
   device_platform_version VARCHAR(256) NULL ENCODE LZO,
   device_locale_code VARCHAR(256) NULL ENCODE LZO,
   device_locale_language VARCHAR(64) NULL ENCODE LZO,
   device_locale_country VARCHAR(64) NULL ENCODE LZO,
   session_id VARCHAR(64) NULL ENCODE LZO,
   session_start_timestamp TIMESTAMP NULL ENCODE LZO,
   session_stop_timestamp TIMESTAMP NULL ENCODE LZO,
   monetization_transaction_id VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_store VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_item_id VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_quantity FLOAT8 NULL,
   monetization_transaction_price_reported VARCHAR(64) NULL ENCODE LZO,
   monetization_transaction_price_amount FLOAT8 NULL,
   monetization_transaction_price_currency_code VARCHAR(16) NULL ENCODE LZO,
   monetization_transaction_price_currency_symbol VARCHAR(32) NULL ENCODE LZO,
   a_class             VARCHAR(4000),
   a_level             VARCHAR(4000),
   a_name              VARCHAR(4000),
   "a_paying customer" VARCHAR(4000),
   m_score             float8,
   "m_time played"     float8,
   "m_total spent"     float8
)
SORTKEY ( application_app_id, event_timestamp, event_type);
 
--Create a view
CREATE OR REPLACE VIEW AWSMA.v_event AS SELECT * FROM AWSMA.event;

 

Step 6: Load events from Amazon S3 into Amazon Redshift (using SQL workbench)

Loading events from Amazon S3 into the AWSMA.event table is a two-step process:

  1. Copy events for a period of time from Amazon S3 to the AWSMA.event_staging table using the copy command.
  2. Insert events from the AWSMA.event_staging table to the AWSMA.event table so that duplicates are not inserted into the AWSMA.event table.

Copy events from Amazon S3

  1. To load data in to Amazon Redshift, use the COPY script provided below. Before executing the script, be sure to make the following updates: Replace <YOUR BUCKET NAME> with the name of your Amazon S3 bucket where events are being exported to.
  2. Replace <APP ID> with the App ID of the app you want to import.
  3. Replace <ACCESS KEY> with the access key of the IAM user that has access to the Amazon S3 bucket.
  4. Replace <SECRET KEY> with the secret key of the IAM user that has access to the Amazon S3 bucket.
  5. Refine the period of time you want to load data for. <YEAR>/<MONTH>/<DAY> specifies the date part of the prefix of the files containing the events you want to import. Depending on when and how many events you intend to load, you should select a prefix that best works for you:

    • 2015/01/20 will load all events that were sent on January 20, 2015.
    • 2015/0 will load all events that were sent between January 2015 and September 2015 (inclusive).
    • 2015 will load all events that were sent during the year.
  6. Update the column list to contain the names of the custom metrics and attributes that you want to load.  The list of columns specified here must be in the same order as they appear in the jsonpaths file.
--start by truncating the staging table
TRUNCATE AWSMA.event_staging;
 
--copy the events from Amazon S3
COPY 
   AWSMA.event_staging(
      "event_type",
      "event_timestamp",
      "arrival_timestamp",
      "event_version",
      "application_app_id",
      "application_package_name",
      "application_version_name",
      "application_version_code",
      "application_title",
      "application_cognito_identity_pool_id",
      "application_sdk_name",
      "application_sdk_version",
      "client_id",
      "client_cognito_id",
      "device_model",
      "device_make",
      "device_platform_name",
      "device_platform_version",
      "device_locale_code",
      "device_locale_language",
      "device_locale_country",
      "session_id",
      "session_start_timestamp",
      "session_stop_timestamp",
      "monetization_transaction_id",
      "monetization_transaction_store",
      "monetization_transaction_item_id",
      "monetization_transaction_quantity",
      "monetization_transaction_price_reported",
      "monetization_transaction_price_amount",
      "monetization_transaction_price_currency_code",
      "monetization_transaction_price_currency_symbol",
      "a_class",
      "a_level",
      "a_name",
      "a_paying customer",
      "m_score",
      "m_time played",
      "m_total spent"
   )
FROM 
   's3://<YOUR BUCKET NAME>/awsma/events/<APP ID>/<YEAR>/<MONTH>/<DAY>' 
credentials 
   'aws_access_key_id=<ACCESS KEY>;aws_secret_access_key=<SECRET KEY>'
EMPTYASNULL gzip TIMEFORMAT AS 'epochmillisecs' MAXERROR AS 0 
JSON AS 's3://<YOUR BUCKET NAME>/jsonpaths/eventmapping.json';

Insert loaded events into the AWSMA.event table

Now that the events are staged in the AWSMA.event_staging table, you will need to insert new events (events that are not already in the AWSMA.event table) into the AWSMA.event table. This allows you to reload the events for the current time period (for example, to load more recent events from the current day) while maintaining distinct events in the AWSMA.event table.

Use the following command to load data from the AWSMA.event_staging table into the AWSMA.event table:

Update the column list to contain the names of the custom metrics and attributes that you want to load.

--Insert events from the staging table that are not already in the AWSMA.event table
INSERT INTO AWSMA.event (
      "event_type",
      "event_timestamp",
      "arrival_timestamp",
      "event_version",
      "application_app_id",
      "application_package_name",
      "application_version_name",
      "application_version_code",
      "application_title",
      "application_cognito_identity_pool_id",
      "application_sdk_name",
      "application_sdk_version",
      "client_id",
      "client_cognito_id",
      "device_model",
      "device_make",
      "device_platform_name",
      "device_platform_version",
      "device_locale_code",
      "device_locale_language",
      "device_locale_country",
      "session_id",
      "session_start_timestamp",
      "session_stop_timestamp",
      "monetization_transaction_id",
      "monetization_transaction_store",
      "monetization_transaction_item_id",
      "monetization_transaction_quantity",
      "monetization_transaction_price_reported",
      "monetization_transaction_price_amount",
      "monetization_transaction_price_currency_code",
      "monetization_transaction_price_currency_symbol",
      "a_class",
      "a_level",
      "a_name",
      "a_paying customer",
      "m_score",
      "m_time played",
      "m_total spent")
SELECT 
      staging."event_type",
      staging."event_timestamp",
      staging."arrival_timestamp",
      staging."event_version",
      staging."application_app_id",
      staging."application_package_name",
      staging."application_version_name",
      staging."application_version_code",
      staging."application_title",
      staging."application_cognito_identity_pool_id",
      staging."application_sdk_name",
      staging."application_sdk_version",
      staging."client_id",
      staging."client_cognito_id",
      staging."device_model",
      staging."device_make",
      staging."device_platform_name",
      staging."device_platform_version",
      staging."device_locale_code",
      staging."device_locale_language",
      staging."device_locale_country",
      staging."session_id",
      staging."session_start_timestamp",
      staging."session_stop_timestamp",
      staging."monetization_transaction_id",
      staging."monetization_transaction_store",
      staging."monetization_transaction_item_id",
      staging."monetization_transaction_quantity",
      staging."monetization_transaction_price_reported",
      staging."monetization_transaction_price_amount",
      staging."monetization_transaction_price_currency_code",
      staging."monetization_transaction_price_currency_symbol",
      staging."a_class",
      staging."a_level",
      staging."a_name",
      staging."a_paying customer",
      staging."m_score",
      staging."m_time played",
      staging."m_total spent"
FROM AWSMA.event_staging staging
LEFT JOIN 
   AWSMA.event events ON events.event_timestamp = staging.event_timestamp 
   AND events.application_app_id = staging.application_app_id 
   AND events.client_id = staging.client_id
WHERE 
   events.client_id IS NULL;

Congratulations! Your events are now ready for you to query.

If you want to continue loading events into Amazon Redshift, you would need to execute only the last three statements (truncate, copy, and insert), and specify the dates (Amazon S3 folder names) that you are loading data for. Reloading the same time interval several times will only bring new events into the AWSMA.event table and not result in any duplication.

Step 7: Create Amazon Redshift users to query the data (using SQL workbench)

While it is possible to use the same user to both insert and query data, it is a best practice to create Amazon Redshift users in a group that has read access to the AWSMA.v_event view and use those users when querying your data using a client such as Tableau. This provides read access to your users without providing the ability to modify or delete data.

--Create the eventReaders group
CREATE GROUP eventReaders;
--Configure AWSMA schema privileges
GRANT usage ON schema AWSMA TO GROUP eventReaders;
GRANT SELECT ON AWSMA.v_event TO GROUP eventReaders;
--creating the eventReader user
CREATE USER <username> IN GROUP eventReaders PASSWORD '<password>';

Queries to get you started

30 Day Active users and devices by day

SELECT 
	application_app_id AS "app id", 
	COUNT(DISTINCT client_id) AS "devices", 
	COUNT(DISTINCT client_cognito_id) AS "users",
	date_trunc('day', event_timestamp) AS "day"
FROM 
	AWSMA.v_event
WHERE 
	event_type = '_session.start' AND 
	event_timestamp BETWEEN getdate() - 30 AND getdate() + 1
GROUP BY 
	"app id", 
	"day"
ORDER BY 
	"app id" ASC, 
	"day" DESC
;

30 Day Active users and devices by device language code

SELECT 
	application_app_id AS "app id", 
	COUNT(DISTINCT client_id) AS "devices", 
	COUNT(DISTINCT client_cognito_id) AS "users",
	device_locale_language AS "language"
FROM 
	AWSMA.v_event
WHERE 
	event_type = '_session.start' AND 
	event_timestamp BETWEEN getdate() - 30 AND getdate() + 1
GROUP BY 
	"app id", 
	"language"
ORDER BY 
	"app id" ASC, 
	"devices" DESC,
	"language"
;

30 day Active users and devices by app version

SELECT 
	application_app_id AS "app id", 
	device_platform_name AS "platform",
	application_version_name AS "version name",
	application_version_code AS "version code",
	COUNT(DISTINCT client_id) AS "devices", 
	COUNT(DISTINCT client_cognito_id) AS "users"
FROM 
	AWSMA.v_event
WHERE 
	event_type = '_session.start' AND 
	event_timestamp BETWEEN getdate() - 30 AND getdate() + 1
GROUP BY 
	"app id", 
	"platform",
	"version name",
	"version code"
ORDER BY 
	"app id" ASC, 
	"platform" ASC,
	"devices" DESC,
	"version name" DESC,
	"version code" DESC
;

Top Item purchases by store (last 30 days)

SELECT 
	application_app_id AS "app id", 
	monetization_transaction_item_id AS "item id",
	monetization_transaction_store AS "store",
	COUNT(DISTINCT client_id) AS "devices", 
	COUNT(DISTINCT client_cognito_id) AS "users",
	SUM(monetization_transaction_quantity) AS "quantity",
	SUM(monetization_transaction_price_amount) "amount (Apple only)",
	monetization_transaction_price_currency_code AS "currency (Apple only)"
FROM 
	AWSMA.v_event
WHERE 
	event_type = '_monetization.purchase' AND 
	event_timestamp BETWEEN getdate() - 30 AND getdate() + 1
GROUP BY 
	"app id",
	"item id",
	"currency (Apple only)",
	"store"
ORDER BY 
	"app id" ASC, 
	"item id" ASC,
	"quantity" DESC,
	"store",
	"devices" DESC
;

Thanks for reading. We would love to hear your questions and feedback. Please submit them to our discussion forum.