The Internet of Things on AWS – Official Blog

Importing historical equipment data into AWS IoT SiteWise

Introduction

AWS IoT SiteWise is a managed service that helps customers collect, store, organize and monitor data from their industrial equipment at scale. Customers often need to bring their historical equipment measurement data from existing systems such as data historians and time series databases into AWS IoT SiteWise for ensuring data continuity, training artificial intelligence (AI) & machine learning (ML) models that can predict equipment failures, and deriving actionable insights.

In this blog post, we will show how you can get started with the BulkImportJob API and import historical equipment data into AWS IoT SiteWise using a code sample.

You can use this imported data to gain insights through AWS IoT SiteWise Monitor and Amazon Managed Grafana, train ML models on Amazon Lookout for Equipment and Amazon SageMaker, and power analytical applications.

To begin a bulk import, customers need to upload a CSV file to Amazon Simple Storage Service (Amazon S3) containing their historical data in a predefined format. After uploading the CSV file, customers can initiate the asynchronous import to AWS IoT SiteWise using the CreateBulkImportJob operation, and monitor the progress using the DescribeBulkImportJob and ListBulkImportJob operations.

Prerequisites

To follow through this blog post, you will need an AWS account and an AWS IoT SiteWise supported region. If you are already using AWS IoT SiteWise, choose a different region for an isolated environment. You are also expected to have some familiarity with Python.

Setup the environment

  1. Create an AWS Cloud9 environment using Amazon Linux 2 platform
  2. Using the terminal in your Cloud9 environment, install Git and clone the sitewise-bulk-import-example repository from Github
    sudo yum install git
    git clone https://github.com/aws-samples/aws-iot-sitewise-bulk-import-example.git
    cd aws-iot-sitewise-bulk-import-example
    pip3 install -r requirements.txt

Walkthrough

For the demonstration in this post, we will use an AWS Cloud9 instance to represent an on-premises developer workstation and simulate two months of historical data for a few production lines in an automobile manufacturing facility.

We will then prepare the data and import it into AWS IoT SiteWise at scale, leveraging several bulk import jobs. Finally, we will verify whether the data was imported successfully.

AWS IoT SiteWise BulkImportJob Architecture

A bulk import job can import data into the two storage tiers offered by AWS IoT SiteWise, depending on how the storage is configured. Before we proceed, let us first define these two storage tiers.

Hot tier: Stores frequently accessed data with lower write-to-read latency. This makes the hot tier ideal for operational dashboards, alarm management systems, and any other applications that require fast access to the recent measurement values from equipment.

Cold tier: Stores less-frequently accessed data with higher read latency, making it ideal for applications that require access to historical data. For instance, it can be used in business intelligence (BI) dashboards, artificial intelligence (AI), and machine learning (ML) training. To store data in the cold tier, AWS IoT SiteWise utilizes an S3 bucket in the customer’s account.

Retention Period: Determines how long your data is stored in the hot tier before it is deleted.

Now that we learned about the storage tiers, let us understand how a bulk import job handles writes for different scenarios. Refer to the table below:

Value Timestamp Write Behavior
New New A new data point is created
New Existing Existing data point is updated with the new value for the provided timestamp
Existing Existing The import job identifies duplicate data and discards it. No changes are made to existing data.

In the next section, we will follow step-by-step instructions to import historical equipment data into AWS IoT SiteWise.

Steps to import historical data

Step 1: Create a sample asset hierarchy

For the purpose of this demonstration, we will create a sample asset hierarchy for a fictitious automobile manufacturer with operations across four different cities. In a real-world scenario, you may already have an existing asset hierarchy in AWS IoT SiteWise, in which case this step is optional.

Step 1.1: Review the configuration

  1. From terminal, navigate to the root of the Git repo.
  2. Review the configuration for asset models and assets.
    cat config/assets_models.yml
  3. Review the schema for asset properties.
    cat schema/sample_stamping_press_properties.json

Step 1.2: Create asset models and assets

  1. Run python3 src/create_asset_hierarchy.py to automatically create asset models, hierarchy definitions, assets, asset associations.
  2. In the AWS Console, navigate to AWS IoT SiteWise, and verify the newly created Models and Assets.
  3. Verify that you see the asset hierarchy similar to the one below.Sample SiteWise Asset Hierarchy

Step 2: Prepare historical data

Step 2.1: Simulate historical data

In this step, for demonstration purpose, we will simulate two months of historical data for four stamping presses across two production lines. In a real-world scenario, this data would typically come from source systems such as data historians and time series databases.

The CreateBulkImportJob API has the following key requirements:

  • To identify an asset property, you will need to specify either an ASSET_ID + PROPERTY_ID combination or the ALIAS.In this blog, we will be using the former.
  • The data needs to be in CSV format.

Follow the steps below to generate data according to these expectations. For more details about the schema, refer to Ingesting data using the CreateBulkImportJob API.

  1. Review the configuration for data simulation.
    cat config/data_simulation.yml
  2. Run python3 src/simulate_historical_data.py to generate simulated historical data for the selected properties and time period. If the total rows exceed rows_per_job as configured in bulk_import.yml, multiple data files will be created to support parallel processing. In this sample, about 700,000+ data points are simulated for the four stamping presses (A-D) across two production lines (Sample_Line 1 and Sample_Line 2). Since we configured rows_per_job as 20,000, a total of 36 data files will be created.
  3. Verify the generated data files under data directory.SiteWise historical CSV data files
  4. The data schema will follow the column_names configured in bulk_import.yml config file.
    79817017-bb13-4611-b4b2-8094913cd287,c487c0d7-a9f2-4fe7-b4bc-92bf6f4f697b,DOUBLE,1667275200,0,GOOD,78.76
    79817017-bb13-4611-b4b2-8094913cd287,c487c0d7-a9f2-4fe7-b4bc-92bf6f4f697b,DOUBLE,1667275260,0,GOOD,67.33
    79817017-bb13-4611-b4b2-8094913cd287,c487c0d7-a9f2-4fe7-b4bc-92bf6f4f697b,DOUBLE,1667275320,0,GOOD,82.13
    79817017-bb13-4611-b4b2-8094913cd287,c487c0d7-a9f2-4fe7-b4bc-92bf6f4f697b,DOUBLE,1667275380,0,GOOD,72.72
    79817017-bb13-4611-b4b2-8094913cd287,c487c0d7-a9f2-4fe7-b4bc-92bf6f4f697b,DOUBLE,1667275440,0,GOOD,61.45

Step 2.2: Upload historical data to Amazon S3

As AWS IoT SiteWise requires the historical data to be available in Amazon S3, we will upload the simulated data to the selected S3 bucket.

  1. Update the data bucket under bulk_import.yml with any existing temporary S3 bucket that can be deleted later.
  2. Run python3 src/upload_to_s3.py to upload the simulated historical data to the configured S3 bucket.
  3. Navigate to Amazon S3 and verify the objects were uploaded successfully.SiteWise Bulkimport historical data in S3

Step 3: Import historical data into AWS IoT SiteWise

Before you can import historical data, AWS IoT SiteWise requires that you enable Cold tier storage. For additional details, refer to Configuring storage settings.

If you have already activated cold tier storage, consider modifying the S3 bucket to a temporary one which can be later deleted while cleaning up the sample resources.

Note that by changing the S3 bucket, none of the data from existing cold tier S3 bucket is copied to the new bucket. When modifying S3 bucket location, ensure the IAM role configured under S3 access role has permissions to access the new S3 bucket.

Step 3.1: Configure storage settings

  1. Navigate to AWS IoT SiteWise, select Storage, then select Activate cold tier storage.
  2. Pick an S3 bucket location of your choice.AWS IoT SiteWise Edit Storage
  3. Select Create a role from an AWS managed template.
  4. Check Activate retention period, enter 30 days, and save.AWS IoT SiteWise Hot Tier Settings

Step 3.2: Provide permissions for AWS IoT SiteWise to read data from Amazon S3

  1. Navigate to AWS IAM, select Policies under Access management, and Create policy.
  2. Switch to JSON tab and replace the content with the following. Update <bucket-name> with the name of data S3 bucket configured in bulk_import.yml.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:*"
          ],
          "Resource": ["arn:aws:s3:::<bucket-name>"]
        }
      ]
    }
  3. Save the policy with Name as SiteWiseBulkImportPolicy.
  4. Select Roles under Access management, and Create role.
  5. Select Custom trust policy and replace the content with the following.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
            "Service": "iotsitewise.amazonaws.com"
           },
        "Action": "sts:AssumeRole"
        }
      ]
    }
  6. Click Next and select the SiteWiseBulkImportPolicy IAM policy created in the previous steps.
  7. Click Next and create the role with Role name as SiteWiseBulkImportRole.
  8. Select Roles under Access management, search for the newly created IAM role SiteWiseBulkImportRole, and click on its name.
  9. Copy the ARN of the IAM role using the copy icon.

Step 3.3: Create AWS IoT SiteWise bulk import jobs

  1. Replace the role_arn field in config/bulk_import.yml with the ARN of SiteWiseBulkImportRole IAM role copied in previous steps.
  2. Update the config/bulk_import.yml file:
    • Replace the role_arn with the ARN of SiteWiseBulkImportRole IAM role.
    • Replace the error_bucket with any existing temporary S3 bucket that can be deleted later.
  3. Run python3 src/create_bulk_import_job.py to import historical data from the S3 bucket into AWS IoT SiteWise:
  4. The script will create multiple jobs to simultaneously import all the data files created into AWS IoT SiteWise. In a real-world scenario, several terabytes of data can be quickly imported into AWS IoT SiteWise using concurrently running jobs.
  5. Check the status of jobs from the output:
    Total S3 objects: 36
    Number of bulk import jobs to create: 36
            Created job: 03e75fb2-1275-487f-a011-5ae6717e0c2e for importing data from data/historical_data_1.csv S3 object
            Created job: 7938c0d2-f177-4979-8959-2536b46f91b3 for importing data from data/historical_data_10.csv S3 object
            …
    Checking job status every 5 secs until completion.
            Job id: 03e75fb2-1275-487f-a011-5ae6717e0c2e, status: COMPLETED
            Job id: 7938c0d2-f177-4979-8959-2536b46f91b3, status: COMPLETED
            …
  6. If you see the status of any job as COMPLETED_WITH_FAILURES or FAILED, refer to Troubleshoot common issues section.

Step 4: Verify the imported data

Once the bulk import jobs are completed, we need to verify if the historical data is successfully imported into AWS IoT SiteWise. You can verify the data either by directly looking at the cold tier storage or by visually inspecting the charts available in AWS IoT SiteWise Monitor.

Step 4.1: Using the cold tier storage

In this step, we will check if new S3 objects have been created in the bucket that was configured for cold tier.

  1. Navigate to Amazon S3 and locate the S3 bucket configured under AWS IoT SiteWise → StorageS3 bucket location (in Step 3) for cold tier storage.
  2. Verify the partitions and objects under the raw/ prefix. AWS IoT SiteWise Cold Tier files

Step 4.2: Using AWS IoT SiteWise Monitor

In this step, we will visually inspect if the charts show data for the imported date range.

  1. Navigate to AWS IoT SiteWise and locate Monitor.
  2. Create a portal to access data stored in AWS IoT SiteWise.
    • Provide AnyCompany Motor as the Portal name.
    • Choose IAM for User authentication.
    • Provide your email address for Support contact email, and click Next.
    • Leave the default configuration for Additional features, and click Create.
    • Under Invite administrators, select your IAM user or IAM Role, and click Next.
    • Click on Assign Users.
  3. Navigate to Portals and open the newly created portal.
  4. Navigate to Assets and select an asset, for example, AnyCompany_MotorSample_ArlingtonSample_StampingSample_Line 1Sample_Stamping Press A.
  5. Use Custom range to match the date range for the data uploaded.
  6. Verify the data rendered in the time series line chart.SiteWise Monitor Example

Troubleshoot common issues

In this section, we will cover the common issues encountered while importing data using bulk import jobs and highlight some possible reasons.

If a bulk import job is not successfully completed, it is best practice to refer to logs in the error S3 bucket configured in bulk_import.yml and understand the root cause.SiteWise BulkImportJob Error Bucket

No data imported

  • Incorrect schema: dataType does not match dataType tied to the asset-property
    The schema provided at Ingesting data using the CreateBulkImportJob API should be followed exactly. Using the console, verify the provided DATA_TYPE provided matches with the data type in the corresponding asset model property.
  • Incorrect ASSET_ID or PROPERTY_ID: Entry is not modeled
    Using the console, verify the corresponding asset and property exists.
  • Duplicate data: A value for this timestamp already exists
    AWS IoT SiteWise detects and automatically discards any duplicate. Using console, verify if the data already exists.

Missing only certain parts of data

  • Missing recent data: BulkImportJob API imports the recent data (that falls within the hot tier retention period) into AWS IoT SiteWise hot tier and doesn’t transfer it immediately to Amazon S3 (cold tier). You may need to wait for the next hot to cold tier transfer cycle, which is currently set to 6 hours.

Clean Up

To avoid any recurring charges, remove the resources created in this blog. Follow the steps to delete these resources:

  1. Navigate to AWS Cloud9 and delete your environment.
  2. Run python3 src/clean_up_asset_hierarchy.py to delete the following resources, in order, from AWS IoT SiteWise:
    • Asset associations
    • Assets
    • Hierarchy definitions from asset models
    • Asset models
  3. From AWS IoT SiteWise console, navigate to MonitorPortals, select the previously created portal, and delete.
  4. Navigate to Amazon S3 and perform the following:
    • Delete the S3 bucket location configured under the Storage section of AWS IoT SiteWise
    • Delete the data and error buckets configured in the /config/bulk_import.yml of Git repo

Conclusion

In this post, you have learned how to use the AWS IoT SiteWise BulkImportJob API to import historical equipment data into AWS IoT SiteWise using AWS Python SDK (Boto3). You can also use the AWS CLI or SDKs for other programming languages to perform the same operation. To learn more about all supported ingestion mechanisms for AWS IoT SiteWise, visit the documentation.

About the authors

Raju Gottumukkala is an IoT Specialist Solutions Architect at AWS, helping industrial manufacturers in their smart manufacturing journey. Raju has helped major enterprises across the energy, life sciences, and automotive industries improve operational efficiency and revenue growth by unlocking true potential of IoT data. Prior to AWS, he worked for Siemens and co-founded dDriven, an Industry 4.0 Data Platform company.
Avik Ghosh is a Senior Product Manager on the AWS Industrial IoT team, focusing on the AWS IoT SiteWise service. With over 18 years of experience in technology innovation and product delivery, he specializes in Industrial IoT, MES, Historian, and large-scale Industry 4.0 solutions. Avik contributes to the conceptualization, research, definition, and validation of Amazon IoT service offerings.