AWS Partner Network (APN) Blog
Leveraging Confluent and AWS to Solve IoT Device and Data Management Challenges
By Braeden Quirante, Cloud Partner Solutions Engineer – Confluent
By Ahmed Zamzam, Sr. Cloud Partner Solutions Engineer – Confluent
By Weifan Liang, Sr. Partner Solutions Architect – AWS
Confluent |
Internet of Things (IoT) use cases typically involve an enormous number of connected devices, both currently in existence and anticipated in the future, which will generate an even greater amount of data.
The increasing prevalence of microcomputing capabilities in nearly every type of device has led to a fully-connected world. From voice assistants and smoke alarms to connected cars, windmills, industrial robots, and airplanes, individuals and businesses can leverage new and improved ways of monitoring, automating, and reacting to situations. However, this also introduces challenges related to scale, such as device management and data processing.
In this post, we will demonstrate how you can use Confluent and Amazon Web Services (AWS) to solve device management and data management challenges that may be encountered in IoT use cases.
Founded by the creators of Apache Kafka, Confluent is an AWS Partner and AWS Marketplace Seller that enables organizations to harness business value from stream data. Confluent manages the barrage of stream data and makes it available throughout an organization.
Challenges with Device Management
Amazon CEO Andy Jassy predicted that in the future, on-premises footprints for most companies would no longer consist of servers, but rather IoT edge devices that are increasingly being deployed in offices, factories, oil fields, agricultural fields, planes, cars, and other areas.
With these billions of connected devices comes a whole new set of security concerns that need to be addressed, and managing these devices and ensuring their security will be critical to the success of IoT projects.
To address these challenges, AWS developed AWS IoT Core, a service that lets you connect billions of IoT devices and route trillions of messages to AWS services without managing infrastructure. It provides devices with access to cloud services, such as automated security audits, a simplified MQTT pub/sub framework, device fleet visualization, fleet management, and provisioning.
For example, firmware updates can be performed over-the-air, allowing for easier maintenance and security fixes. Device certificates are also handled by AWS IoT Core, ensuring that only authorized devices can access the network.
Built-in monitoring provides real-time device fleet visualization, allowing you to easily monitor the status of your devices. This feature enables you to quickly identify issues and take necessary actions to ensure your IoT devices remain operational.
Challenges with Real-Time Data Processing
As the number of connected devices and their corresponding messages continue to grow exponentially, real-time data processing becomes increasingly challenging. In many cases, batch processing is no longer a viable option, and high-volume throughput is critical to ensure smooth operation.
Confluent, an AWS Data and Analytics Competency Partner, makes it easy to capture and process Gbps-scale data and connect your applications, data systems, and entire organization with real-time data flows. It offers a resilient and scalable data streaming platform based on Apache Kafka, delivered as a fully managed service.
To complement AWS IoT Core’s suite of features, Confluent’s stream processing can deliver continuous, real-time data integration across IoT devices, unifying data that’s normally too large or complex for traditional data integration tools.
Solution Overview
Next, we will demonstrate how you can leverage the combined power of AWS IoT Core and Confluent to address the challenges of IoT use cases. We’ll use a specific use case as an example, where we simulate readings to measure carbon dioxide (CO2) and nitrogen oxide (NOx) emissions in a plant’s boiler system.
We will then showcase how you can automate the response by injecting ammonia if the NOx readings are high. By the end of this post, you’ll have a better understanding of how AWS IoT Core and Confluent can help manage and respond to real-time data from IoT devices.
Figure 1 – Real-time IoT device processing.
The data pipeline includes the following stages:
- Send CO2 and NOx readings to AWS
- Route data to Confluent to for real-time processing
- Act on the data as it’s generated and detect anomalies in real-time
- Trigger an AWS Lambda function to automate response and update device shadow to adjust NOx and CO2 levels
- Update thing
Routing Data to Confluent Cloud Using IoT Rules
Once the messages are sent from the device to AWS IoT Core, you can use the rules engine to process, filter, and route these messages to different destination services.
One of these destinations is Apache Kafka, and thus Confluent Cloud. The IoT rules can transform and route the incoming message data to the appropriate Kafka topics in Confluent based on the rules engine SQL statement.
Integrating AWS IoT Core with Confluent Cloud requires creating a virtual private cloud (VPC) destination by specifying your VPC and at least two subnets since Apache Kafka is usually deployed within VPCs. Once you have specified the subnets, the rules engine will create an elastic network interface (ENI) in each of them to enable seamless integration.
Therefore, if your Confluent Cloud cluster uses a private connectivity option like AWS PrivateLink, AWS Transit Gateway, or VPC peering, you need to ensure the VPC you specify allows ENIs in that VPC to access your Confluent Cloud cluster.
On the other hand, if you’re using a public Confluent Cloud cluster like in our solution, you should specify public subnets. This is because the subnets you specify should enable ENIs to access the internet and communicate with your Confluent Cloud cluster.
After specifying networking communication, you can create your Apache Kafka rule action. When configuring the rule action, you’ll need to specify the following:
- VPC destination that was created earlier
- Connection parameters to the Confluent Cluster:
- Bootstrap.servers
- Security.protocol – for Confluent SASL_SSL
- Sasl.mechanism – for Confluent use PLAIN
- Username and password – API and secret key for the cluster
Processing Readings in Real-Time
Once the data is ingested into Confluent, real-time data processing can occur using ksqlDB, a fully-managed, purpose-built stream processing database that enables you to analyze and process data within Confluent by using SQL-like semantics.
In our solution, we utilize ksqlDB to perform stateful aggregation and calculate the average NOx level readings over the last one-minute window. We group the data by “DEVICE_ID” and only consider groups that have an average “nox_concentration” greater than three.
The following is the ksqlDB query we execute:
The above query is a persistent query that continuously processes incoming event data. In our specific use case, we’re generating a new materialized table view along with a corresponding Kafka sink topic in Confluent. The query results are streamed as a changelog into this topic. The output Kafka topic will only contain information related to devices that have an average NOx concentration greater than three and need to be adjusted.
Automating Response
There are over 120+ pre-built connectors (66 available as fully managed with Confluent on AWS) to sink the data into different systems, one of which is AWS Lambda. The output topic in Confluent will trigger a Lambda function to automate the response and adjust the NOx levels.
There are two ways to integrate Lambda with Confluent; first is the native event source mapping. With this approach, Lambda polls new messages from the input topic and synchronously invokes the target Lambda function. In this mode, Lambda starts by allocating one Lambda consumer to process all partitions in the input topic. It then monitors the load through the consumer offset lag and automatically scales the number of consumers based on the workload.
The second option is using the fully managed AWS Lambda Sink Connector for Confluent Cloud. This allows you to invoke Lambda functions synchronously or asynchronously, which helps prevent stalled partitions.
In our solution, we have opted to use the fully managed Lambda Sink Connector in asynchronous mode. By using this mode, the connector will trigger Lambda functions without waiting for a response, resulting in delivery guarantees of at-most once. Any errors can be caught in the function and sent to a dead letter queue (topic) in Confluent. While this trade-off prioritizes throughput over consistency, it’s necessary for our example solution given the continuous stream of readings from potentially hundreds of thousands of sensors.
To configure the connector, we define the desired mode and other necessary parameters. Modifying the connector configuration in Confluent is a non-destructive process. Confluent offers multiple ways to modify the configuration, including through its user interface (UI), API, and Terraform.
Updating Device Shadow
Ultimately, AWS Lambda will update the device shadow which will, in turn, update the device when it connects to the cloud and adjust the NOx levels.
The AWS IoT Device Shadow service adds shadows to AWS IoT thing objects. Shadows can make a device’s state available to apps and other services whether the device is connected to AWS IoT or not.
A thing’s shadow is a JSON document used to store and retrieve current-state information for a thing (created in AWS IoT Core). We can update/get the shadow document for a thing using API calls, HTTP requests, and publishing to reserved topics over MQTT.
Shadows provide a reliable data store to share data, and they enable devices, apps, and other cloud services to connect and disconnect without losing a device’s state. While devices, apps, and other cloud services are connected to AWS IoT, they can access and control the current state of a device through its shadows.
For example, an app can request a change in a device’s state by updating a shadow. AWS IoT publishes a message that indicates the change to the device. The device receives this message, updates its state to match, and publishes a message with its updated state. The AWS IoT Device Shadow service reflects this updated state in the corresponding shadow
From a Lambda function, your app can update a shadow’s desired state by using the UpdateThingShadow API call via the Boto3 Python. Updates affect only the fields specified in the request.
Below is a code snippet of the fix Lambda function:
Deploy Solution
You can deploy the solution by following the instructions provided on GitHub. To do so, you’ll need to have Terraform installed on your computer.
Once the deployment is complete, you’ll need to create ksqlDB queries. The post-deployment steps in the repository will guide you through this process.
Finally, to begin simulating readings and creating anomalies, the instructions in the Run the demo section.
Cleanup
To clean up the solution, simply delete the connector and destroy the Terraform environment by running terraform destroy
from your terraform directory run
Conclusion
In this post, we showcased a solution to monitor and respond to real-time signals from IoT devices. We leveraged the combined power of AWS IoT Core and Confluent to easily scale to hundreds or thousands of devices to ingest, process, transform, and manage your high-volume, high-velocity IoT data continuously without any disruption.
You can integrate this solution with your fleet of IoT devices and your own set of rules. This can be achieved by using and modifying the source code available on GitHub.
Confluent Cloud offers elastic scaling and pricing that charges only for what you stream. If you would like to learn more about Confluent Cloud, sign up for an account and receive $400 USD to spend during your first 30 days.
Learn more about Confluent in AWS Marketplace.
.
.
Confluent – AWS Partner Spotlight
Confluent is an AWS Data and Analytics Competency Partner that was founded by the creators of Apache Kafka and enables organizations to harness business value from stream data.