The Internet of Things on AWS – Official Blog

Unlocking Scalable IoT Analytics on AWS

The Internet of Things (IoT) is generating unprecedented amounts of data, with billions of connected devices streaming terabytes of information every day. For businesses and organizations aiming to derive valuable insights from their IoT data, AWS offers a range of powerful analytics services.

AWS IoT Analytics provides a starting point for many customers beginning their IoT analytics journey. It offers a fully managed service that allows for quick ingestion, processing, storage, and analysis of IoT data. With IoT Analytics, you can filter, transform, and enrich your data before storing it in a time-series data store for analysis. The service also includes built-in tools and integrations with services like Amazon QuickSight for creating dashboards and visualizations, helping you understand your IoT data effectively. However, as IoT deployments grow and data volumes increase, customers often need additional scalability and flexibility to meet evolving analytics requirements. This is where services like Amazon Kinesis, Amazon S3, and Amazon Athena come in. These services are designed to handle massive-scale streaming data ingestion, durable and cost-effective storage, and fast SQL-based querying, respectively.

In this post, we’ll explore the benefits of migrating your IoT analytics workloads from AWS IoT Analytics to Kinesis, S3, and Athena. We’ll discuss how this architecture can enable you to scale your analytics efforts to handle the most demanding IoT use cases and provide a step-by-step guide to help you plan and execute your migration.

Migration Options

When considering a migration from AWS IoT Analytics, it’s important to understand the benefits and reasons behind this shift. The table below provides alternate options and a mapping to existing IoT Analytics features

AWS IoT Analytics Alternate Services  Reasoning
Collect
AWS IoT Analytics makes it easy to ingest data directly from AWS IoT Core or other sources using the BatchPutMessage API. This integration ensures a seamless flow of data from your devices to the analytics platform. Amazon Kinesis Data Streams
Or
Amazon Data Firehose 

Amazon Kinesis offers a robust solution. Kinesis streams data in real-time, enabling immediate processing and analysis, which is crucial for applications needing real-time insights and anomaly detection.

Amazon Data Firehose simplifies the process of capturing and transforming streaming data before it lands in Amazon S3, automatically scaling to match your data throughput.

Process
Processing data in AWS IoT Analytics involves cleansing, filtering, transforming, and enriching it with external sources. Managed Streaming for Apache Flink
Or
Amazon Data Firehose

Managed Streaming for Apache Flink supports complex event processing, such as pattern matching and aggregations, which are essential for sophisticated IoT analytics scenarios.

Amazon Data Firehose handles simpler transformations and can invoke AWS Lambda functions for custom processing, providing flexibility without the complexity of Flink.

Store
AWS IoT Analytics uses a time-series data store optimized for IoT data, which includes features like data retention policies and access management.

Amazon S3

or

Amazon Timestream

Amazon S3 offers a scalable, durable, and cost-effective storage solution. S3’s integration with other AWS services makes it an excellent choice for long-term storage and analysis of massive datasets.

Amazon Timestream is a purpose-built time series database. You can batch load data from S3.

Analyze
AWS IoT Analytics provides built-in SQL query capabilities, time-series analysis, and support for hosted Jupyter Notebooks, making it easy to perform advanced analytics and machine learning. AWS Glue and Amazon Athena

 AWS Glue simplifies the ETL process, making it easy to extract, transform, and load data, while also providing a data catalog that integrates with Athena to facilitate querying.

Amazon Athena takes this a step further by allowing you to run SQL queries directly on data stored in S3 without needing to manage any infrastructure.

Visualize
AWS IoT Analytics integrates with Amazon QuickSight, enabling the creation of rich visualizations and dashboards so you can still continue to use QuickSight depending on which alternate datastore you decide to use, like S3.

Migration Guide

In the current architecture, IoT data flows from IoT Core to IoT Analytics via an IoT Core rule. IoT Analytics handles ingestion, transformation, and storage. To complete the migration there are two steps to follow:

  • redirect ongoing data ingestion, followed by
  • export previously ingested data

Figure 1: Current Architecture to Ingest IoT Data with AWS IoT Analytics 

Step1: Redirecting Ongoing Data Ingestion

The first step in your migration is to redirect your ongoing data ingestion to a new service. We recommend two patterns based on your specific use case:

Figure 2: Suggested architecture patterns for IoT data ingestion 

Pattern 1: Amazon Kinesis Data Streams with Amazon Managed Service for Apache Flink

Overview:

In this pattern, you start by publishing data to AWS IoT Core which integrates with Amazon Kinesis Data Streams allowing you to collect, process, and analyze large bandwidth of data in real time.

Metrics & Analytics:

  1. Ingest Data: IoT data is ingested into a Amazon Kinesis Data Streams in real-time. Kinesis Data Streams can handle a high throughput of data from millions of IoT devices, enabling real-time analytics and anomaly detection.
  2. Process Data: Use Amazon Managed Streaming for Apache Flink to process, enrich, and filter the data from the Kinesis Data Stream. Flink provides robust features for complex event processing, such as aggregations, joins, and temporal operations.
  3. Store Data: Flink outputs the processed data to Amazon S3 for storage and further analysis. This data can then be queried using Amazon Athena or integrated with other AWS analytics services.

When to use this pattern?

If your application involves high-bandwidth streaming data and requires advanced processing, such as pattern matching or windowing, this pattern is the best fit.

Pattern 2: Amazon Data Firehose

Overview:

In this pattern, data is published to AWS IoT Core, which integrates with Amazon Data Firehose, allowing you to store data directly in Amazon S3. This pattern also supports basic transformations using AWS Lambda.

Metrics & Analytics:

  1. Ingest Data: IoT data is ingested directly from your devices or IoT Core into Amazon Data Firehose.
  2. Transform Data: Firehose performs basic transformations and processing on the data, such as format conversion and enrichment. You can enable Firehose data transformation by configuring it to invoke AWS Lambda functions to transform the incoming source data before delivering it to destinations.
  3. Store Data: The processed data is delivered to Amazon S3 in near real-time. Amazon Data Firehose automatically scales to match the throughput of incoming data, ensuring reliable and efficient data delivery.

When to use this pattern?

This is a good fit for workloads that need basic transformations and processing. In addition, Amazon Data Firehose simplifies the process by offering data buffering and dynamic partitioning capabilities for data stored in S3.

Ad-hoc querying for both patterns:

As you migrate your IoT analytics workloads to Amazon Kinesis Data Streams, or Amazon Data Firehose, leveraging AWS Glue and Amazon Athena can further streamline your data analysis process. AWS Glue simplifies data preparation and transformation, while Amazon Athena enables quick, serverless querying of your data. Together, they provide a powerful, scalable, and cost-effective solution for analyzing IoT data.

Figure 3: Ad-hoc querying for both patterns

Step 2: Export Previously Ingested Data

For data previously ingested and stored in AWS IoT Analytics, you’ll need to export it to Amazon S3. To simplify this process, you can use a CloudFormation template to automate the entire data export workflow. You can use the script for partial (time range-based) data extraction.

Figure 4: Architecture to export previously ingested data using CloudFormation

CloudFormation Template to Export data to S3

The diagram below illustrates the process of using a CloudFormation template to create a dataset within the same IoT Analytics datastore, enabling selection based on a timestamp. This allows users to retrieve specific data points within a desired timeframe. Additionally, a Content Delivery Rule is created to export the data into an S3 bucket.

Step-by-Step Guide

  1. Prepare the CloudFormation Template: copy the provided CloudFormation template and save it as a YAML file (e.g., migrate-datasource.yaml).
# Cloudformation Template to migrate an AWS IoT Analytics datastore to an external dataset
AWSTemplateFormatVersion: 2010-09-09
Description: Migrate an AWS IoT Analytics datastore to an external dataset
Parameters:
  DatastoreName:
    Type: String
    Description: The name of the datastore to migrate.
    AllowedPattern: ^[a-zA-Z0-9_]+$
  TimeRange:
    Type: String
    Description: |
      This is an optional argument to split the source data into multiple files.
      The value should follow the SQL syntax of WHERE clause.
      E.g. WHERE DATE(Item_TimeStamp) BETWEEN '09/16/2010 05:00:00' and '09/21/2010 09:00:00'.
    Default: ''
  MigrationS3Bucket:
    Type: String
    Description: The S3 Bucket where the datastore will be migrated to.
    AllowedPattern: (?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$
  MigrationS3BucketPrefix:
    Type: String
    Description: The prefix of the S3 Bucket where the datastore will be migrated to.
    Default: ''
    AllowedPattern: (^([a-zA-Z0-9.\-_]*\/)*$)|(^$)
Resources:
  # IAM Role to be assumed by the AWS IoT Analytics service to access the external dataset
  DatastoreMigrationRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: iotanalytics.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: AllowAccessToExternalDataset
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - s3:GetBucketLocation
                  - s3:GetObject
                  - s3:ListBucket
                  - s3:ListBucketMultipartUploads
                  - s3:ListMultipartUploadParts
                  - s3:AbortMultipartUpload
                  - s3:PutObject
                  - s3:DeleteObject
                Resource:
                  - !Sub arn:aws:s3:::${MigrationS3Bucket}
                  - !Sub arn:aws:s3:::${MigrationS3Bucket}/${MigrationS3BucketPrefix}*

  # This dataset that will be created in the external S3 Export
  MigratedDataset:
    Type: AWS::IoTAnalytics::Dataset
    Properties:
      DatasetName: !Sub ${DatastoreName}_generated
      Actions:
        - ActionName: SqlAction
          QueryAction:
            SqlQuery: !Sub SELECT * FROM ${DatastoreName} ${TimeRange}
      ContentDeliveryRules:
        - Destination:
            S3DestinationConfiguration:
              Bucket: !Ref MigrationS3Bucket
              Key: !Sub ${MigrationS3BucketPrefix}${DatastoreName}/!{iotanalytics:scheduleTime}/!{iotanalytics:versionId}.csv
              RoleArn: !GetAtt DatastoreMigrationRole.Arn
      RetentionPeriod:
        Unlimited: true
      VersioningConfiguration:
        Unlimited: true
  1. Identify the IoT Analytics Datastore: Determine the IoT Analytics datastore that requires data to be exported. For this guide, we will use a sample datastore named “iot_analytics_datastore”.

  1. Create or identify an S3 bucket where the data will be exported. For this guide, we will use the “iot-analytics-export” bucket.

  1. Create the CloudFormation stack
    • Navigate to the AWS CloudFormation console.
    • Click on “Create stack” and select “With new resources (standard)”.
    • Upload the migrate-datasource.yaml file.

  1. Enter a stack name and provide the following parameters:
    1. DatastoreName: The name of the IoT Analytics datastore you want to migrate.
    2. MigrationS3Bucket: The S3 bucket where the migrated data will be stored.
    3. MigrationS3BucketPrefix (optional): The prefix for the S3 bucket.
    4. TimeRange (optional): An SQL WHERE clause to filter the data being exported, allowing for splitting the source data into multiple files based on the specified time range.

  1. Click “Next” on the Configure stack options screen.
  2. Acknowledge by selecting the checkbox on the review and create page and click “Submit”.

  1. Review stack creation on the events tab for completion.

  1. On successful stack completion, navigate to IoT Analytics → Datasets to view the migrated dataset.

  1. Select the generated dataset and click “Run now” to export the dataset.

  1. The content can be viewed on the “Content” tab of the dataset.

  1. Finally, you can review the exported content by opening the “iot-analytics-export” bucket in the S3 console.

Considerations:

  • Cost Considerations: You can refer to AWS IoT Analytics pricing page for costs involved in the data migration. Consider deleting the newly created dataset when done to avoid any unnecessary costs.
  • Full Dataset Export: To export the complete dataset without any time-based splitting, you can also use AWS IoT Analytics Console and set a content delivery rule accordingly.

Summary

Migrating your IoT analytics workload from AWS IoT Analytics to Amazon Kinesis Data Streams, S3, and Amazon Athena enhances your ability to handle large-scale, complex IoT data. This architecture provides scalable, durable storage and powerful analytics capabilities, enabling you to gain deeper insights from your IoT data in real-time.

Cleaning up resources created via CloudFormation is essential to avoid unexpected costs once the migration has completed.

By following the migration guide, you can seamlessly transition your data ingestion and processing pipelines, ensuring continuous and reliable data flow. Leveraging AWS Glue and Amazon Athena further simplifies data preparation and querying, allowing you to perform sophisticated analyses without managing any infrastructure.

This approach empowers you to scale your IoT analytics efforts effectively, making it easier to adapt to the growing demands of your business and extract maximum value from your IoT data.


About the Author

Umesh Kalaspurkar
Umesh Kalaspurkar is a New York based Solutions Architect for AWS. He brings more than 20 years of experience in design and delivery of Digital Innovation and Transformation projects, across enterprises and startups. He is motivated by helping customers identify and overcome challenges. Outside of work, Umesh enjoys being a father, skiing, and traveling.

Ameer Hakme
Ameer Hakme is an AWS Solutions Architect based in Pennsylvania. He works with Independent software vendors in the Northeast to help them design and build scalable and modern platforms on the AWS Cloud. In his spare time, he enjoys riding his motorcycle and spend time with his family.

Rizwan Syed

Rizwan is a Sr. IoT Consultant at AWS, and have over 20 years of experience across diverse domains like IoT, Industrial IoT, AI/ML, Embedded/Realtime Systems, Security and Reconfigurable Computing. He has collaborated with customers to designed and develop unique solutions to thier use cases. Outside of work, Rizwan enjoys being a father, diy activities and computer gaming.