Securely ingesting large-sized payloads from IoT devices to the AWS Cloud
AWS IoT Core lets you securely ingest payloads from IoT devices to the AWS Cloud at a large scale—supporting billions of devices and trillions of messages. It also lets you process the messages and manage the devices from the cloud reliably and securely. One challenge you may have faced while designing your solution with AWS IoT Core is that there is a hard limit on maximum permissible size of MQTT payload. At the time of writing the blog, the maximum MQTT payload which AWS IoT Core can support is only 128KB (be sure to check here for the latest information). The AWS IoT service rejects publish and connect requests larger than this size. Some common IoT use cases with large-sized payload could include:
- Ingesting medical images to the cloud.
- Recording and transmitting heart or lung sounds from medical devices.
- Transmitting sound file to detect car accidents in a smart city.
- Taking and transmitting images with license plate when traffic rules are violated.
- Ingesting binary files generated from industrial machines to the cloud.
In this blog post, I explain a pattern which addresses the problem of ingesting large-sized IoT payloads in a scalable way. This is particularly applicable to constrained devices without edge capabilities, and devices which have enough memory to store one or more payloads depending on the use case before they are ingested to cloud.
Additionally, I explain how security is implemented by design. This is critical because of the following risks:
- IoT ecosystems can have large attack surfaces since the devices are dependent on internet-supported connectivity.
- Device fleets can grow rapidly; hence it becomes more important that security is implemented by design in the complete development lifecycle. Retrofitting security design in a later stage adds more complexity by introducing architectural or design changes.
To address the challenge of hard limits on MQTT payload size, you can use Amazon Simple Storage Service (Amazon S3) to store the payload using HTTPS as a secondary protocol, while still using the features of AWS IoT Core such as device shadows, registry, and rules engine for rest of the requirements. Amazon S3 is a reliable & cost-effective service to store the large objects.
It is best practice to keep Amazon S3 buckets private, secure, and follow the principle of least privileges. One of the recommended mechanisms for any entity to interact with a private S3 bucket is by using a pre-signed URL, which in this case is generated on request for each device by a cloud-side Lambda function. A pre-signed URL is a URL that grants temporary access to a specific S3 object without requiring AWS security credentials or permissions for a specific time period and then it expires. Using the URL, you can either READ the object or WRITE an Object. If you are new to AWS S3 pre-signed URL, please read Using pre-signed URLs and Uploading objects using pre-signed URLs.
Once the file is ingested, the next step is to act on the data. Unlike AWS IoT Core, there is no rule engine associated with S3 buckets. You can use Amazon S3 Event Notifications to receive notifications whenever a PUT event is triggered in your S3 bucket. Amazon S3 supports the following destinations where it can publish events:
In this solution, I use AWS Lambda to act on the payload but based on your use case you can use other two destinations as well to publish events. For more details, please read Amazon S3 Event Notifications.
To push the payload to AWS, the device performs the following steps:
- To connect with the AWS Cloud, the IoT device requests access to AWS IoT Core by authenticating itself using X.509 certificates.
- The IoT device sends a request to a topic to generate a pre-signed URL.
- Rules engine invokes a Lambda function to generate a pre-signed URL for a specific period.
- The Lambda function publishes the pre-signed URL to a device specific topic.
- The IoT device receives the URL and uploads the payload to an S3 bucket using HTTPS POST.
- S3 sends an event to the Lambda function to start processing when the file is uploaded.
Another variant of this approach is to use device shadows to communicate pre-signed URL from cloud to device as in Step 4. In IoT applications, command topics are used to control a device remotely and to acknowledge successful command executions. For best practices, please read Using the AWS IoT shadow for commands.
Note that the solution outlined in this blog is not the only way to ingest large-sized payloads from IoT devices to the cloud. Here are a couple alternatives, which in some instances may be suitable; however, we will not be diving deep into them:
- Chunking the message: In this solution, a large payload is split into smaller chunks on the device side and published using MQTT. The subscriber of the the topic owns additional responsibility to collect the chunks, reorder and reassemble the message which adds incremental steps to the solution. A challenge with this approach is that it is error prone and can increase cost since customers are charged by the number of messages transmitted between devices and AWS IoT Core.
- REST server: REST APIs provide flexibility & scalability; however, REST by design requires a connection to be made with each request. This introduces latency, and increases IO and power consumption. In addition to that, REST server requires network connectivity to serve the request. If your constrained devices are required to ingest data frequently at a low latency, or if connectivity may be intermittent, the REST server method may be unsuitable.
I’ll now explain how the solution explained above lets you accomplish security by design.
A strong IoT device authentication mechanism is required so that only trusted devices access the cloud. Using a strong authentication mechanism helps prevent device spoofing or hackers gaining access to the cloud.
Typically, each device will have a unique X.509 certificate. These certificates provide AWS IoT with the ability to authenticate device connections. To enable devices access to your private Amazon S3 data, you can authenticate POST requests by generating a pre-signed URL.
Authorization is the process of granting appropriate permissions to an authenticated device. Only an authorized device gets access to presigned URL as this URL grant access to an S3 bucket where the payload is persisted. It is important that we configure the presigned URL expiry time carefully considering device upload bandwidth and abilities. Once received, these URLS cannot be revoked or timed out.
Using AWS IoT Core, you can associate the required permissions (IoT policies) to each certificate associated with authorized devices to securely connect and operate with AWS. Using policy-based authorization, principle of least privileges approach is followed and every device gets access to only specific topics intended for that device. This ensures URL is not accessed by any unauthorized entities.
End to end encryption
In the solution previously mentioned, there are two channels for communication. MQTT is used to request and receive pre-signed URL, and HTTPS is used to upload the payload on S3.
By default, AWS IoT data is encrypted both at rest and in transit. The message broker encrypts all communication while in-transit by using TLS version 1.2.. Data at rest is encrypted using AWS-owned keys.
To upload the payload on S3, HTTPS encrypts data in transit and helps prevent unauthorized users from eavesdropping on or manipulating network traffic. To encrypt S3 data at rest, you can use either Server-side Encryption (SSE) or client-side encryption. For more S3-related security best practices please read Security Best Practices for Amazon S3.
In this post I explained a pattern to securely ingest large sized payloads from IoT devices to the AWS Cloud. I also gave a walkthrough of the solution architecture and covered how security is implemented by design. The services used in the pattern are managed & scalable services (AWS IoT Core, S3, and Lambda) which makes them ideal building blocks for a highly scalable IoT platform. To learn more, check out the AWS IoT Core documentation.
About the author
Vishal Gupta is a Solution Architect at Amazon Internet Services Private Limited (AISPL), based in Delhi, India. Vishal works with AWS Digital Native Business (DNB) customers and enable them to design and architect innovative solutions on AWS. Outside work, he enjoys traveling to new destinations and spending time with his family.