Design IoT jobs for rapid large scale device updates with advanced device group target patterns

Customer IoT applications require rapid over the air (OTA) updates to maintain the state of their IoT things. This becomes increasingly important as IoT fleets grow. Jobs for AWS IoT Device Management is a feature to push updates to targeted edge devices. Each job targets devices in static or dynamic AWS IoT thing groups. Static thing groups include a set of specified IoT things while dynamic thing groups contain things that match a specified query and automatically update if things are added that match this query. This blog provides design patterns to help customers achieve updates at the fastest possible rate of devices per minute (Thing Targets that Pre-exist AWS IoT Registry Entry and Target Priming), which is required for large scale rapid deployments. This blog also describes design patterns to target devices that cannot be modeled using a single thing group (e.g., cannot be described using a simple query) by creating an Include List or a Query with Exclude List.

Solution walkthrough

This solution (Github repo) provides strategies to roll out jobs at the maximum account level rate even as the job target thing groups are being populated. The challenges that this solution helps customers overcome include job rollout rate limitations due to:

AWS service limits from API calls required to populate the job targets
API throttling from exceeding service limits that results in exponential back-off
Time required to populate new AWS IoT static and dynamic thing groups
AWS IoT things that cannot be targeted using a fleet index query

AWS account level service limits

The AWS IoT Core and AWS IoT Device Management service limits that govern the maximum rate of job updates that can occur (devices/min) include:

MaximumJobExecutionsPerMinute: The direct job rate limit
UpdateThingShadow: Used for select advanced job target patterns described in later sections
AddThingToThingGroup: Used for select advanced job target patterns described in later sections

Each of these limits can be increased with a service request to increase the overall job rate.

Thing targets that pre-exist AWS IoT registry entry

There are scenarios where a device manufacturer or device provisioner has information about the device before it is registered in AWS IoT as a thing. For this use case, a continuous job can be created before the thing exists in the cloud. If the target is a dynamic group, the job will be queued at the time the thing is registered as long as the thing is selected by the group query. If the target is a static group, the thing will need to be manually added to the group when registered using the AddThingToThingGroup API. This manual addition process is described in more detail in the following Include List and Query with Exclude List sections.

Target priming

A job queues devices that are available in the target thing group at the selected job execution rate (i.e., maximum value is MaximumJobExecutionsPerMinute). This means that the job execution rollout rate is the lesser of two rates (1) the rate at which devices are added to the thing group (not desirable), and (2) the job executions rollout configuration parameter. Target priming is the process of populating the target groups with devices at a rate that is faster than the MaximumJobExecutionsPerMinute to maximize the rollout rate of job Executions for a given job.

For use cases when there is time available to populate thing groups before a job, static or dynamic thing groups can be fully populated before creating the snapshot job. For example, this priming technique can be used if the things that need to be targeted are in the thing registry and are known well in advance of the time when the update needs to occur. Keep in mind that the process of adding things to a static group is rate limited by the AddThingToThingGroup API calls and the process of adding things to a dynamic group is rate limited by the time required for Fleet Indexing for AWS IoT Device Management service to populate the group.
When time constraints do not allow for fully populating the target groups before creating a job, the following method should be used to populate the target group faster than the job execution rate. This method uses a dynamic group target with a query that selects a thing shadow attribute and requires updating each thing to have this shadow attribute. The process of first creating an empty dynamic thing group, and then updating the thing shadow adds the thing to the dynamic group within seconds, which is a significant rate increase compared to allowing the dynamic group to index the entire thing registry. Additionally, the UpdateThingShadow rate limit is higher than the MaximumJobExecutionsPerMinute rate. This means that the job roll out rate is determined by MaximumJobExecutionsPerMinute instead of the rate to populate the thing group and provides the maximum job rate. To use this method, you must enable fleet indexing for Thing Shadow.
1. Create dynamic group that selects thing shadow attribute and, if using a query, also include query. Sample query: shadow.reported.newVersion:1.1 && attributes.productId:widget123
2. Create continuous job that targets the dynamic group from 2a.
3. Update the thing shadow of all things that will be targeted by the job (e.g., shadow.reported.newVersion=1.1).
  - List: Perform UpdateThingShadow for all results in list.
  - Query: Use fleet index to perform query. Perform UpdateThingShadow for all results from query. Things that are added to the account after this process will be queued when the dynamic group has finished indexing all things in the account.

Use cases

The Include List and Query with Exclude List design patterns in the following sections include generating and populating static thing groups from a list of things. Appropriate storage and compute resources should be allocated for processing large lists. These design patterns provide job roll out at the maximum rate (MaximumJobExecutionsPerMinute) even as the target groups are being populated. Bulk Registration can also be used to add a list of things to a thing group when time is not a constraint.

Include list

This use case applies when the target group of things are provided as a list that cannot be modeled using a query. With a list, additional considerations are required to target things that are on the list but are added to the account after job creation.

If all devices on the include list are in the account prior to job creation, use the methods from the Target Priming section.
If devices on the include list can be added to the thing registry after initial creation of the list, the following method can be used.
1. Use the method in Target Priming section 2.
2. Store the pre-registered include list provided by the manufacturer in a database (e.g., DynamoDB).
3. Trigger lambda when new things are registered that checks if the thing is in the database and then adds thing shadow attribute to the thing if it is in the database list.

Query with exclude list

This use case applies when the target group of things are provided as a query with an additional list of things that need to be excluded such that the target cannot be modeled using the query alone. With an exclude list, additional considerations are required to avoid adding things that are on the exclude list and are added to the thing registry after job creation.

If all targeted devices are in the account prior to job creation.
1. Create a dynamic group that selects thing shadow attribute.
2. Create continuous job that targets the dynamic group from 1a.
3. Update the thing shadow of all things that will be targeted by the job: Use fleet index to obtain devices from the query. Use the UpdateThingShadow API for all things obtained from the query that are not present on the exclude list.
If devices on the exclude list can be added to the thing registry after initial creation of the list, the following technique can be used.
1. Create a static group that will comprise of devices on the exclude list (in the future).
2. Create a dynamic group with a query that selects things with a thing shadow attribute and does not include the static group.
  Sample query: shadow.reported.newVersion:2.0 && NOT thingGroupNames:excludeListA
3. Create continuous job that targets the dynamic group from 2b.
4. Use the AddThingToThingGroup API to add all things on the exclude list to the static exclude thing group.
5. Update the thing shadow of all things that will be targeted by the job: Use fleet index to obtain devices from the query. Use the UpdateThingShadow API for all things obtained from the query that are not present on the exclude list.
6. Store the pre-registered exclude list provided by the manufacturer in a database (e.g., Amazon DynamoDB).
7. Trigger an AWS Lambda function when new things are registered that checks if the thing is in the database and then adds the thing to the static thing group to exclude upon match.

Design pattern alternatives

Multiple design patterns provided in this section include examples that use a dynamic IoT thing group with a query that selects things based on values from the thing shadow. Alternative patterns can be used such as having the dynamic group query select things based on values of thing attributes instead of thing shadow values.

Metadata
Time to read: 10 min
Time to complete: 10 min
Cost to complete: < $1
Learning level: Advanced (300)
Services used: Amazon S3, AWS IoT Core, AWS IoT Device Management, AWS Lambda

Prerequisites

To follow along, you will need an AWS account.

Demo code walkthrough

This walkthrough demonstrates how to create a job that will nearly instantly start queueing AWS IoT things regardless of number of things. The target group includes things that are selected based on a fleet index query but removes a custom list of things that are provided as an exclude list. This pattern is used to demonstrate how to address scenarios when things cannot be modeled with a simple query.

Clone the Project Repo.
Complete the project setup steps in the readme.md.
Invoke Lambda function to seed account with IoT things.
1. Open the AWS Lambda console.
2. Select the seedThings Lambda function.
3. Invoke the seedThings Lambda function with the event below to create 1,000 AWS IoT things with prefix myDemoThings.
  {
  "mode": "seed",
  "demoThingPrefix": "myDemoThings",
  "seedConfigNumber": 1000
  }
Upload sample csv file exclude list to S3.
1. Open the S3 console.
2. Select the iotJobsLists bucket.
3. Press Upload to upload the sample list of things to exclude from a job that is provided in the project repo at sampleList/excludeMyDemoThings.csv.
Invoke Lambda function to create job.
1. Open the Lambda console.
2. Select the job Lambda function.
3. Invoke the job Lambda function with event below to create the job, a static group with things from the exclude list, a dynamic group that includes all things selected with the fleetIndexQuery minus the exclude list. Note: This Lambda function first enables AWS IoT registry and shadow indexing. The time to enable fleet indexing is dependent on the number of things in your account. If the enabling fleet index takes greater than 15 min, the Lambda function will timeout to avoid unnecessary compute. If the Lambda function times out, invoke the Lambda function again after fleet index is enabled. You can manually check the fleet index status by navigating to the AWS IoT settings tab.
  {"jobName": "myFirstJob","fleetIndexQuery": "myDemoThings*","excludeListFileName": "excludeMyDemoThings.csv"}

Cleaning up

To clean up your account so that you do not incur future charges:

Delete S3 files.
1. Open the S3 console.
2. Select the iotJobsLists bucket.
3. Select all files in bucket (e.g., excludeMyDemoThings.csv).
4. Press Delete to delete files.
Delete cloud infrastructure created by AWS Amplify.
1. Open the AWS Amplify console.
2. Select the AWS Amplify App names iotjobsblog.
3. Choose to delete.
Invoke seedThings AWS Lambda function to delete IoT things.
1. Open the AWS Lambda console.
2. Select the seedThings Lambda function.
3. Invoke the seedThings Lambda function with the event below to delete all things with the “myDemoThings” prefix that were created for this walkthrough.
  {
  "mode": "delete",
  "demoThingPrefix": "myDemoThings"
  }
Delete IoT jobs.
a. Open the AWS IoT console.
b. In the navigation pane, under Manage, choose Jobs.
c. Select jobs created for this walkthrough (e.g., “myFirstJob”).
d. Choose to cancel job.
e. Choose to delete job.
Delete IoT thing groups.
a. Open the AWS IoT console.
b. In the navigation pane, under Manage, choose Thing groups.
c. Select thing groups created for this walkthrough (e.g., “myFirstJob” and “myFirstJob-exclude”).
d. Choose to delete the AWS IoT Thing groups.
Update fleet index configuration.
a. Open the AWS IoT console.
b. In the navigation pane, choose Settings.
c. Choose Manage Indexing.
d. Update your fleet index configuration to the original state before completing the walkthrough and then press Update.

Conclusion

As IoT fleet sizes grow, more customers are asking for ways to rapidly push over the air updates to their devices. The design patterns and code sample demonstrate ways to 1) target subsets of the IoT fleet for updates and 2) automate jobs to push updates at the service limit MaximumJobExecutionsPerMinute. This walkthrough provides AWS Lambda functions to operate at the peak jobs speed even when requiring complex patterns for IoT thing targets.

For further information on the services used, you can consult the AWS IoT Core and AWS IoT Device Management web pages.

Authors

David Johnsen

David is a Senior Sustainability Application Architect with AWS Professional Services with a Ph.D. in Environmental Engineering. He helps AWS customers achieve their sustainability business goals by architecting and building innovative solutions on AWS. His focus is on using AWS services to help customers across industries to measure and reduce their carbon emissions from their own business operations.

Manish Talreja

Manish is a Senior Machine Learning and IoT Architect with AWS Professional Services. He helps AWS Customers achieve their business goals by architecting and building innovative solutions that leverage AWS IoT services on the AWS Cloud.

The Internet of Things on AWS – Official Blog