Analyzing Amazon SES event data with AWS Analytics Services
In this post, we will walk through using AWS Services, such as, Amazon Kinesis Firehose, Amazon Athena and Amazon QuickSight to monitor Amazon SES email sending events with the granularity and level of detail required to get insights from your customers engage with the emails you send.
Nowadays, email Marketers rely on internal applications to create their campaigns or any communications requirements, such us newsletters or promotional content. From those activities, they need to collect as much information as possible to analyze and improve their pipeline to get better interaction with the customers. Data such us bounces, rejections, success reception, delivery delays, complaints or open rate can be a powerful tool to understand the customers. Usually applications work with high-level data points without detailed logging or granular information that could help improve even better the effectiveness of their campaigns.
Amazon Simple Email Service (SES) is a smart tool for companies that wants a cost-effective, flexible, and scalable email service solution to easily integrate with their own products. Amazon SES provides methods to control your sending activity with built-in integration with Amazon CloudWatch Metrics and also provides a mechanism to collect the email sending events data.
In this post, we propose an architecture and step-by-step guide to track your email sending activities at a granular level, where you can configure several types of email sending events, including sends, deliveries, opens, clicks, bounces, complaints, rejections, rendering failures, and delivery delays. We will use the configuration set feature of Amazon SES to send detailed logging to our analytics services to store, query and create dashboards for a detailed view.
Overview of solution
This architecture uses Amazon SES built-in features and AWS analytics services to provide a quick and cost-effective solution to address your mail tracking requirements. The following services will be implemented or configured:
- Amazon Simple Email Service (SES)
- Amazon Simple Service (S3)
- Amazon Kinesis Firehose
- Amazon Athena
- AWS Glue Data Catalog
- Amazon QuickSight
The following diagram shows the architecture of the solution:
The flow of the events starts when a customer uses Amazon SES to send an email. Each of those send events will be capture by the configuration set feature and forward the events to a Kinesis Firehose delivery stream to buffer and store those events on an Amazon S3 bucket.
After storing the events, it will be required to create a database and table schema and store it on AWS Glue Data Catalog in order for Amazon Athena to be able to properly query those events on S3. Finally, we will use Amazon QuickSight to create interactive dashboard to search and visualize all your sending activity with an email level of detailed.
For this walkthrough, you should have the following prerequisites:
- An AWS Account
- SES Domain in production mode
- Appropriate Identity and Access Management permissions to configure Amazon S3, Amazon Athena, AWS Glue Data Catalog, Amazon Kinesis Firehose and Amazon Quicksight.
- A Quicksight instance created with an Author user
Step 1: Use AWS CloudFormation to deploy some additional prerequisites
You can get started with our sample AWS CloudFormation template that includes some prerequisites. This template creates an Amazon S3 Bucket, an IAM role needed to access from Amazon SES to Amazon Kinesis Data Firehose.
To download the CloudFormation template, run one of the following commands, depending on your operating system:
curl https://raw.githubusercontent.com/aws-samples/amazon-ses-analytics-blog/main/SES-Blog-PreRequisites.yml -o SES-Blog-PreRequisites.yml
To deploy the template, use the following AWS CLI command:
aws cloudformation deploy --template-file ./SES-Blog-PreRequisites.yml --stack-name ses-dashboard-prerequisites --capabilities CAPABILITY_NAMED_IAM
After the template finishes creating resources, you see the IAM Service role and the Delivery Stream on the stack Outputs tab. You are going to use these resources in the following steps.
Step 2: Creating a configuration set in SES and setting the default configuration set for a verified identity
SES can track the number of send, delivery, open, click, bounce, and complaint events for each email you send. You can use event publishing to send information about these events to other AWS service. In this case we are going to send the events to Kinesis Firehose. To do this, a configuration set is required.
To create a configuration set, complete the following steps:
- On the AWS Console, choose the Amazon Simple Email Service.
- Choose Configuration sets.
- Click on Create set.
- Set a Configuration set name.
- Leave the other configurations by default.
- Once the configuration set is created, select Event destinations
- Click on Add destination
- Select the event types you would like to analyze and then click on next.
- Select Amazon Kinesis Data Firehose as the destination, choose the delivery stream and the IAM role created previously, click on next and in the review page, click on Add destination.
- Once you have created the configuration set and added the event destination, you can define the Default configuration set for the verified identity (domain or email address). In the SES console, choose Verified identities.
- Choose the verified identity from which you want to collect events and select Configuration set. Click on Edit.
- Click on the checkbox Assign a default configuration set and choose the configuration set created previously.
- Once you have completed the previous steps, your events will be sent to Amazon S3. Due to the buffer’s configuration on the Kinesis Delivery Stream, the data will be loaded every 5 minutes or every 5 MiB to Amazon S3. You can check the structure created on the bucket and see json logs with SES events data.
Step 3: Using Amazon Athena to query the SES event logs
Amazon SES publishes email sending event records to Amazon Kinesis Data Firehose in JSON format. The top-level JSON object contains an eventType string, a mail object, and either a Bounce, Complaint, Delivery, Send, Reject, Open, Click, Rendering Failure, or DeliveryDelay object, depending on the type of event.
- In order to simplify the analysis of email sending events, create the sesmaster table by running the following script in Amazon Athena. Don’t forget to change the location in the following script with your own bucket containing the data of email sending events.
CREATE EXTERNAL TABLE sesmaster ( eventType string, complaint struct < arrivaldate: string, complainedrecipients: array < struct < emailaddress: string >>, complaintfeedbacktype: string, feedbackid: string, `timestamp`: string, useragent: string >, bounce struct < bouncedrecipients: array < struct < action: string, diagnosticcode: string, emailaddress: string, status: string >>, bouncesubtype: string, bouncetype: string, feedbackid: string, reportingmta: string, `timestamp`: string >, mail struct < timestamp: string, source: string, sourcearn: string, sendingaccountid: string, messageid: string, destination: string, headerstruncated: boolean, headers: array < struct < name: string, value: string >>, commonheaders: struct < `from`: array < string >, to: array < string >, messageid: string, subject: string >, tags: struct < ses_source_tls_version: string, ses_operation: string, ses_configurationset: string, ses_source_ip: string, ses_outgoing_ip: string, ses_from_domain: string, ses_caller_identity: string >>, send string, delivery struct < processingtimemillis: int, recipients: array < string >, reportingmta: string, smtpresponse: string, `timestamp`: string >, open struct < ipaddress: string, `timestamp`: string, userAgent: string >, reject struct < reason: string >, click struct < ipAddress: string, `timestamp`: string, userAgent: string, link: string > ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( "mapping.ses_caller_identity" = "ses:caller-identity", "mapping.ses_configurationset" = "ses:configuration-set", "mapping.ses_from_domain" = "ses:from-domain", "mapping.ses_operation" = "ses:opeation", "mapping.ses_outgoing_ip" = "ses:outgoing-ip", "mapping.ses_source_ip" = "ses:source-ip", "mapping.ses_source_tls_version" = "ses:source-tls-version" ) LOCATION 's3://aws-s3-ses-analytics-<aws-account-number>/'
The sesmaster table uses the org.openx.data.jsonserde.JsonSerDe SerDe library to deserialize the JSON data.
We have leveraged the support for JSON arrays and maps and the support for nested data structures. Those features ease the process of preparation and visualization of data.
In the sesmaster table, the following mappings were applied to avoid errors due to name of JSON fields containing colons.
- “mapping.ses_caller_identity”=”ses:caller-identity” “mapping.ses_outgoing_ip”=”ses:outgoing-ip”
- Once the sesmaster table is ready, it is a good strategy to create curated views of its data. The first view called vwSESMaster contains all the records of email sending events and all the fields which are unique on each event. Create the vwSESMaster view by running the following script in Amazon Athena.
CREATE OR REPLACE VIEW vwSESMaster AS SELECT eventtype as eventtype , mail.messageId as mailmessageid , mail.timestamp as mailtimestamp , mail.source as mailsource , mail.sendingAccountId as mailsendingAccountId , mail.commonHeaders.subject as mailsubject , mail.tags.ses_configurationset as mailses_configurationset , mail.tags.ses_source_ip as mailses_source_ip , mail.tags.ses_from_domain as mailses_from_domain , mail.tags.ses_outgoing_ip as mailses_outgoing_ip , delivery.processingtimemillis as deliveryprocessingtimemillis , delivery.reportingmta as deliveryreportingmta , delivery.smtpresponse as deliverysmtpresponse , delivery.timestamp as deliverytimestamp , delivery.recipients as deliveryrecipient , open.ipaddress as openipaddress , open.timestamp as opentimestamp , open.userAgent as openuseragent , bounce.bounceType as bouncebounceType , bounce.bouncesubtype as bouncebouncesubtype , bounce.feedbackid as bouncefeedbackid , bounce.timestamp as bouncetimestamp , bounce.reportingMTA as bouncereportingmta , click.ipAddress as clickipaddress , click.timestamp as clicktimestamp , click.userAgent as clickuseragent , click.link as clicklink , complaint.timestamp as complainttimestamp , complaint.userAgent as complaintuseragent , complaint.complaintFeedbackType as complaintcomplaintfeedbacktype , complaint.arrivalDate as complaintarrivaldate , reject.reason as rejectreason FROM sesmaster
The sesmaster table contains some fields which are represented by nested arrays, so it is necessary to flatten them into multiples rows. Following you can see the event types and the fields which need to be flatten.
- Event type SEND: field mail.commonHeaders
- Event type BOUNCE: field bounce.bouncedrecipients
- Event type COMPLAINT: field complaint.complainedrecipients
To flatten those arrays into multiple rows, we used the CROSS JOIN in conjunction with the UNNEST operator using the following strategy for all the three events:
- Create a temporal view with the mail.messageID and the field to be flattened.
- Create another temporal view with the array flattened into multiple rows.
- Create the final view joining the sesmaster table with the second temporal view by event type and mail.messageID.
To create those views, follow the next steps.
- Run the following scripts in Amazon Athena to flat the mail.commonHeaders array in the SEND event type
CREATE OR REPLACE VIEW vwSendMailTmpSendTo AS SELECT mail.messageId as messageid , mail.commonHeaders.to as recipients FROM sesmaster WHERE eventtype='Send'
CREATE OR REPLACE VIEW vwsendmailrecipients AS SELECT messageid , recipient FROM ("vwSendMailTmpSendTo" CROSS JOIN UNNEST(recipients) t (recipient))
CREATE OR REPLACE VIEW vwSentMails AS SELECT eventtype as eventtype , mail.messageId as mailmessageid , mail.timestamp as mailtimestamp , mail.source as mailsource , mail.sendingAccountId as mailsendingAccountId , mail.commonHeaders.subject as mailsubject , mail.tags.ses_configurationset as mailses_configurationset , mail.tags.ses_source_ip as mailses_source_ip , mail.tags.ses_from_domain as mailses_from_domain , mail.tags.ses_outgoing_ip as mailses_outgoing_ip , dest.recipient as mailto FROM sesmaster as sm ,vwsendmailrecipients as dest WHERE sm.eventtype = 'Send' and sm.mail.messageid = dest.messageid
- Run the following scripts in Amazon Athena to flat the bounce.bouncedrecipients array in the BOUNCE event type
CREATE OR REPLACE VIEW vwbouncemailtmprecipients AS SELECT mail.messageId as messageid , bounce.bouncedrecipients FROM sesmaster WHERE (eventtype = 'Bounce')
CREATE OR REPLACE VIEW vwbouncemailrecipients AS SELECT messageid , recipient.action , recipient.diagnosticcode , recipient.emailaddress FROM (vwbouncemailtmprecipients CROSS JOIN UNNEST(bouncedrecipients) t (recipient))
CREATE OR REPLACE VIEW vwBouncedMails AS SELECT eventtype as eventtype , mail.messageId as mailmessageid , mail.timestamp as mailtimestamp , mail.source as mailsource , mail.sendingAccountId as mailsendingAccountId , mail.commonHeaders.subject as mailsubject , mail.tags.ses_configurationset as mailses_configurationset , mail.tags.ses_source_ip as mailses_source_ip , mail.tags.ses_from_domain as mailses_from_domain , mail.tags.ses_outgoing_ip as mailses_outgoing_ip , bounce.bounceType as bouncebounceType , bounce.bouncesubtype as bouncebouncesubtype , bounce.feedbackid as bouncefeedbackid , bounce.timestamp as bouncetimestamp , bounce.reportingMTA as bouncereportingmta , bd.action as bounceaction , bd.diagnosticcode as bouncediagnosticcode , bd.emailaddress as bounceemailaddress FROM sesmaster as sm ,vwbouncemailrecipients as bd WHERE sm.eventtype = 'Bounce' and sm.mail.messageid = bd.messageid
- Run the following scripts in Amazon Athena to flat the complaint.complainedrecipients array in the COMPLAINT event type
CREATE OR REPLACE VIEW vwcomplainttmprecipients AS SELECT mail.messageId as messageid , complaint.complainedrecipients FROM sesmaster WHERE (eventtype = 'Complaint')
CREATE OR REPLACE VIEW vwcomplainedrecipients AS SELECT messageid , recipient.emailaddress FROM (vwcomplainttmprecipients CROSS JOIN UNNEST(complainedrecipients) t (recipient))
At the end we have one table and four views which can be used in Amazon QuickSight to analyze email sending events:
- Table sesmaster
- View vwSESMaster
- View vwSentMails
- View vwBouncedMails
- View vwComplainedemails
Step 4: Analyze and visualize data with Amazon QuickSight
In this blog post, we use Amazon QuickSight to analyze and to visualize email sending events from the sesmaster and the four curated views created previously. Amazon QuickSight can directly access data through Athena. Its pay-per-session pricing enables you to put analytical insights into the hands of everyone in your organization.
Let’s set this up together. We first need to select our table and our views to create new data sources in Athena and then we use these data sources to populate the visualization. We are creating just an example of visualization. Feel free to create your own visualization based on your information needs.
Before we can use the data in Amazon QuickSight, we need to first grant access to the underlying S3 bucket. If you haven’t done so already for other analyses, see our documentation on how to do so.
- On the Amazon QuickSight home page, choose Datasets from the menu on the left side, then choose New dataset from the upper-right corner, set and pick Athena as data source. In the following dialog box, give the data source a descriptive name and choose Create data source.
- In the following dialog box, select the Catalog and the Database containing your sesmaster and curated views. Let’s select the sesmaster table in order to create some basic Key Performance Indicators. Select the table sesmaster and click on the Select
- Our sesmaster table now is a data source for Amazon QuickSight and we can turn to visualizing the data.
- You can see the list fields on the left. The canvas on the right is still empty. Before we populate it with data, let’s select Key Performance Indicator from the available visual types.
- To populate the graph, drag and drop the fields from the field list on the left onto their respective destinations. In our case, we put the field send onto the value well and use count as aggregation.
- Add another visual from the left-upper side and select Key Performance Indicator as visual type.
- Put the field Delivery onto the value well and use count as aggregation.
- Repeat the same procedure, (steps 1 to 4) to count the number of Open, Click, Bounce, Complaint and Reject Events. At the end, you should see something similar to the following visualization. After resizing and rearranging the visuals, you should get an analysis like the shown in the image below.
- Let´s add another dataset by clicking the pencil on the right of the current Dataset.
- On the following dialog box, select Add Dataset.
- Select the view called vwsesmaster and click Select.
Now you can see all the available fields of the vwsesmaster view.
- Let’s create a new visual and select the Table visual type.
- Drag and drop the fields from the field list on the left onto their respective destinations. In our case, we put the fields eventtype, mailmessageid, and mailsubject onto the Group By well, but you can add as many fields as you need.
- Now let’s create a filter for this visual in order to filter by type of event. Be sure you select the table and then click on Filter on the left menu.
- Click on Create One and select the field eventtype on the popup window. Now select the eventtype filter to see the following options.
- Click on the dots on the right of the eventtype filter and select Add to Sheet.
- Leave all the default values, scroll down and select Apply
- Now you can filter the vwsesmaster view by eventtype.
- You can continue customizing your visualization with all the available data in the sesmaster table, the vwsesmaster view and even add more datasets to include data from the vwSentMails, vwBouncedMails, and vwComplainedemails views. Below, you can see some other visualizations created from those views.
To avoid ongoing charges, clean up the resources you created as part of this post:
- Delete the visualizations created in Amazon Quicksight.
- Unsubscribe from Amazon QuickSight if you are not using it for other projects.
- Delete the views and tables created in Amazon Athena.
- Delete the Amazon SES configuration set.
- Delete the Amazon SES events stored in S3.
- Delete the CloudFormation stack in order to delete the Amazon Kinesis Delivery Stream.
In this blog we showed how you can use AWS native services and features to quickly create an email tracking solution based on Amazon SES events to have a more detailed view on your sending activities. This solution uses a full serverless architecture without having to manage the underlying infrastructure and giving you the flexibility to use the solution for small, medium or intense Amazon SES usage, without having to take care of any servers.
We showed you some samples of dashboards and analysis that can be built for most of customers requirements, but of course you can evolve this solution and customize it according to your needs, adding or removing charts, filters or events to the dashboard. Please refer to the following documentation for the available Amazon SES Events, their structure and also how to create analysis and dashboards on Amazon QuickSight:
- Contents of event data that Amazon SES publishes to Amazon SNS
- Quick Start: Create an Analysis with a Single Visual Using Sample Data
From a performance and cost efficiency perspective there are still several configurations that can be done to improve the solution, for example using a columnar file formant like parquet, compressing with snappy or setting your S3 partition strategy according to your email sending usage. Another improvement could be importing data into SPICE to read data in Amazon Quicksight. Using SPICE results in the data being loaded from Athena only once, until it is either manually refreshed or automatically refreshed using a schedule.
You can use this walkthrough to configure your first SES dashboard and start visualizing events detail. You can adjust the services described in this blog according to your company requirements.
About the authors
|Oscar Mendoza is a Solutions Architect at AWS based in Bogotá, Colombia. Oscar works with our customers to provide guidance in architectural best practices and to build Well Architected solutions on the AWS platform. He enjoys spending time with his family and his dog and playing music.
|Luis Eduardo Torres is a Solutions Architect at AWS based in Bogotá, Colombia. He helps companies to build their business using the AWS cloud platform. He has a great interest in Analytics and has been leading the Analytics track of AWS Podcast in Spanish.
|Santiago Benavídez is a Solutions Architect at AWS based in Buenos Aires, Argentina, with more than 13 years of experience in IT, currently helping DNB/ISV customers to achieve their business goals using the breadth and depth of AWS services, designing highly available, resilient and cost-effective architectures.