Introducing AWS CloudFormation support for AWS IoT Analytics
AWS CloudFormation support for AWS IoT Analytics resources was launched on December 18th, 2018. In this blog post, we introduce conventions for building IoT Analytics projects using CloudFormation and provide an array of sample templates to help you get started.
As a refresher, every AWS IoT Analytics project has three required resources for data ingestion, and one or more optional resources for analyzing or visualizing your data. On the data ingestion side, each project will have a minimum of a channel, pipeline, and data store. Channels receive and store raw IoT messages from AWS IoT Core or via the BatchPutMessage API (for existing data stored prior to IoT Analytics). MQTT messages received by a channel are processed by any connected pipelines, where those messages can be enriched, cleansed, filtered, and generally transformed to fit your analysis. Finally, messages processed by the pipeline are kept in a data store for later analyses.
On the analysis side, your projects will use one or more data sets to query data out of your data store. Think of this as a materialized view, or subset of the data store to be analyzed. The most basic query for a data set–although not recommended to be performed on very large data stores–is “select * from my_data_store” which fetches the entire contents of the data store and caches it as a CSV file for you. This content can then be utilized by a hosted Jupyter Notebook for machine learning, executed by a container application in Elastic Container Service, or imported into Amazon QuickSight for visualization and analytical exploration.
Templates for ingestion patterns
We’ll first start with three CloudFormation templates that initialize patterns for ingesting data to IoT Analytics projects. The first pattern is a basic “data workflow,” where a distinct channel, pipeline, and data store are created as a linked group. The concept for this pattern is that all the data in this project is similar, should be processed by the same pipeline for transformations, and stored in the same data store. It also includes a single data set that queries the data store on a daily basis, fetching the previous 24-hour chunk of records.
The second pattern represents a fan-in model, where multiple pairs of channels and pipelines feed into a single data store. This is useful for merging dissimilar data schema, or when data retention needs are unique for each channel. In the following example template, there are two channels, two pipelines to normalize the data from their respective channels, one data store and one data set. The first channel is assumed to receive messages with a temperature measurement as “temp” and units of Fahrenheit. It has a data retention policy for 30 days. The second channel is assumed to receive messages with a temperature measurement as “t” and units of Centigrade. It has a data retention policy of 7 days. The pipelines convert either “temp” or “t” to the attribute “temperature” and normalize output as Centigrade. The data set in this example scans every fifteen minutes for records above and below the liquid temperatures of water.
The third pattern represents a fan-out analytics model, where a single channel of raw data is processed by multiple pipelines and stored in multiple data sets. This is common when the same input data is used for disparate analyses. This represents a convention for supplying data consumers (such as business analysts or data scientists) their own customized copies of the data on which to work. In the following example template, there is a single channel, two pipelines, and two data stores. Each pipeline receives messages from the same channel, but has different functions for processing the messages and stores them in separate data stores.
Templates for analytical patterns
If you’re integrating your IoT Analytics project with an Amazon SageMaker notebook instance for machine learning use cases, this next template will create an end-to-end project. It includes the provisioning of a SageMaker instance that is compatible with our plugin for building containerized versions of your notebook for use in our “container data set” feature. This template is handy for creating a full project that handles all the authorization details of setting up IoT Analytics with SageMaker. Note, once this project is configured, you’ll still need to create an actual notebook using the IoT Analytics console. You can start with a blank notebook, or use any one of our templates to jump-start your journey into machine learning with IoT data.
Already have your own containerized application in Elastic Container Service? This next template builds an end-to-end project where your container will be executed on a schedule you provide. This is useful when you already have a container image, perhaps one provided by a community resource or colleague, which is not already deployed in your AWS account or specific region.
If you have your own container and want to execute it when an IoT Analytics data set triggers creation of new content (instead of on a scheduled timer like the previous example), check out this next template.
We hope you will value the time savings of using CloudFormation to start or advance your next IoT Analytics project. If you have any questions or want to share a template you created, please visit our forum.