AWS Database Blog
Simplify Industrial IoT: Use InfluxDB edge replication for centralized time series analytics with Amazon Timestream
As industrial and manufacturing companies embark on their digital transformation journey, they are looking to capture and process large volumes of near real-time data for optimizing production, reducing downtime, and improving overall efficiency. As part of this, they’re looking to store data locally at the plant floor or on-premises data center for real-time low-latency reporting and monitoring, along with using the power of cloud for long-term trending and machine learning (ML) use cases.
In this post, we show how you can use InfluxDB 2.X on the edge and enable real-time data replication from edge devices to an Amazon Timestream for InfluxDB managed instance. We also demonstrate how you can scale this solution across multiple sites and replicate the data to the cloud in a seamless manner with a low-code approach. Additionally, we show how this solution helps make sure plant operations are not impacted in the event of a network connectivity issue.
The challenge
Industrial IoT (IIoT) applications face a unique set of challenges when it comes to collecting, processing, and analyzing data from dispersed devices, including the following:
- Data collection and transmission – Collecting data from edge devices, processing it, and transmitting it to a central location or cloud-based instance can be a complex and time-consuming process. Manual or self-built data collection and replication solutions can lead to errors, delays, and increased costs.
- Real-time data requirements – Many IIoT applications require real-time data to make timely decisions, detect anomalies, and predict maintenance needs. Delays in data transmission and processing can lead to missed opportunities, reduced efficiency, and increased downtime.
- Internet connectivity limitations – IIoT devices are often located in remote or hard-to-reach areas, where internet connectivity is limited or unreliable. This can make it difficult to transmit data in real time, leading to delays and disruptions in data analysis and decision-making.
- Data freshness and accuracy – IIoT applications require fresh and accurate data to make informed decisions. Stale or inaccurate data can lead to incorrect conclusions, reduced efficiency, and increased costs.
- Alarms and notifications – In IIoT applications, alarms and notifications are critical for detecting anomalies, predicting maintenance needs, and providing timely intervention. Delays in alarm and notification transmission can lead to missed opportunities, reduced efficiency, and increased downtime.
- Offline and edge computing – Many IIoT applications require edge computing and offline capabilities to provide continuous operation, even in the absence of internet connectivity. This requires devices to be able to collect, process, and analyze data locally, and then transmit it to a central location or cloud-based instance when connectivity is restored.
Benefits of Timestream for InfluxDB
InfluxDB 2.x and its edge replication capability addresses these challenges by enabling real-time data replication from edge devices to a Timestream for InfluxDB managed instance. This feature allows IIoT applications to do the following:
- Collect data close to the devices – Telegraf and InfluxDB can help collect real-time data on the plant floor, consuming MQTT data and storing it in InfluxDB. With AWS IoT Greengrass, an open source edge runtime, we host InfluxDB on premises (edge). Data ingestion can be set up using an EMQX broker, available as an AWS provided component on AWS IoT Greengrass.
- Process data in real-time – The on-premises InfluxDB instance enables real-time ingestion, enabling faster insights and decision-making.
- Scale effortlessly – By replicating data to a cloud-based instance, IIoT applications can scale effortlessly without worrying about local storage or processing limitations. This also unlocks new use cases, extending the value of data over time.
- High availability – by choosing to use a Multi-AZ deployment with your Timestream for InfluxDB instances we will configure and maintain an stand-by replica that will automatically become active in case your primary instance fails.
Let’s consider a manufacturing plant that uses IIoT sensors to monitor equipment performance, temperature, and vibration. With InfluxDB 2.x edge replication, the plant can accomplish the following:
- Automate data collection – InfluxDB edge nodes collect data from sensors and replicate it to a cloud-based instance in real time.
- Process data in real-time – The local instances processes data in real time, presenting monitoring dashboards to local teams while the cloud instance aggregates data from multiple location to discover trends, present insights, and compare performance. This enhances key use cases like enabling the plant to detect anomalies, predict maintenance needs, and optimize production while mixing local data with sets generated from other locations.
- Scale effortlessly – As the plant expands, edge replication enables seamless scaling, without worrying about local storage or processing limitations.
- Fully managed – A fully managed Timestream for InfluxDB instance enables you to focus on building analytics and visualizations on the data without worrying about database management tasks.
Solution overview
In this solution, we host InfluxDB on AWS IoT Greengrass, an open source edge runtime and cloud service that helps you build, deploy, and manage intelligent device software. Edge replication is enabled on these InfluxDB instances; as soon as data is ingested on the edge instances, it is replicated into the Timestream for InfluxDB instance in the AWS Cloud. Keep in mind that you will incur charges for the AWS resources used in the solution.
Optionally, you can host this on any containerized solution or a virtual machine at the edge.
In a typical manufacturing or industrial plant, the sensors in the factory line send telemetry data using a broker like EMQX MQTT broker, which is then consumed by the data stores at the edge gateway.
This solution can be expanded to multiple edge locations, enabling you to build a seamless view of data across multiple sites. The following diagram illustrates a solution architecture that replicates the data into a single Amazon Timestream for InfluxDB instance.
The following diagram illustrates the data flow from the edge gateway to the cloud.
In our sample setup, we simulate the data by ingesting a CSV file that contains pressure and temperature data from the sensors into InfluxDB and replicate the data in near real time into a Timestream for InfluxDB instance.
In the next sections, we describe the step-by-step approach for this setup.
Prerequisites
You should have the following prerequisites:
- An AWS account.
- The required AWS Identity and Access Management (IAM) privileges to launch AWS IoT Greengrass and Amazon Timestream for InfluxDB.
- An Ubuntu EC2 instance that acts as the edge gateway (on-premises). For instructions, see Step 1: Create an Ubuntu Amazon EC2 instance.
Set up the edge gateway
Complete the following steps to set up your edge gateway and install the required software:
- Set up your AWS account and launch an Ubuntu EC2 instance.
We use us-east-1
as the AWS Region for our setup. You can use a t3.medium instance for this setup. For the purposes of this post, we use this EC2 instance as the edge gateway.
- Install AWS IoT Greengrass on the EC2 instance you provisioned. For instructions, refer to Install AWS IoT Greengrass Core software with manual resource provisioning.
InfluxDB is available as an AWS IoT Greengrass component.
- Follow the instructions on the readme.md to deploy the component.
- Make sure to confirm that version 2.7 is installed from the Docker hub.
- Provide your database credentials as part of the configuration. You use this as the user name and password to log in to InfluxDB at the edge. They are stored in AWS Secrets Manager.
Only the aws.greengrass.labs.database.InfluxDB
component is required for the purposes of this post. You can ignore the remaining components included in the GitHub repo.
- Now that you have set up AWS IoT Greengrass with the required components, you can access the database using the Influx CLI:
The Influx CLI is already configured for you in Docker. To create additional API tokens for the instance, refer to InfluxDB Token creation.
- Add
--skip-verify
to these commands only if using self-signed certificates with HTTPS (the default configuration). For example:
With this step, you have now set up InfluxDB on the edge and you’re ready to ingest data. For this post, you deploy a sample simulator component that will write data into InfluxDB. In a production setup, this would be real data ingested from the sensors in your plant floor.
- Create a bucket named
get-started
: - Ingest sample data for Line 1, for temperature and vibration:
You could also use the InfluxDB publisher component if you want to write data automatically. In a production scenario, the data will flow from your sensor devices, passing through an MQTT broker like EMQX, and ingested into the InfluxDB instance.
- Run a sample query on the Influx CLI to make sure data is being written to the bucket:
You will see results similar to the following screenshot:
With this step, you have set up an edge gateway and ingesting simulated data from the devices. Now you can set up the Timestream for InfluxDB instance in your AWS account, which will act as the target for replication.
Set up the Timestream for InfluxDB instance
Complete the following steps to set up and configure the Timestream for InfluxDB instance:
- Launch a Timestream for InfluxDB instance. For instructions, refer to Step 2: Create an InfluxDB DB instance. Make sure to create this instance with “Public Accessible” enabled for our setup for the blog post Proof of concept.
For a production implementation , you could follow the instructions on Connecting to Timestream for InfluxDB through a VPC endpoint.
Launch the instance in the same Region where AWS IoT Greengrass was deployed. For this post, we set up the InfluxDB instance with a public endpoint.
- Note the values for the
public endpoint
,credentials
, andorganization
created for the instance. You can access this on the Secrets Manager console. - Define rules in your security group to make sure port 8086 is accessible from the EC2 instance set up as the edge gateway.
- Install the Influx CLI on your local workstation to interact with Timestream for InfluxDB.
Optionally, you can use the CLI on the edge gateway and configure it to access the Timestream for InfluxDB instance as well. See the following code (provide the endpoint, organization, and user name you noted earlier):
- Create another bucket in Amazon Timestream for InfluxDB for the data after it’s replicated from the edge gateway (use the organization you noted earlier):
- Note the
bucketid
to use in later steps. - Use the following command to get the API token for connecting to the Timestream for InfluxDB instance:
- Use the following code to get the organization name and ID.
You use the API token and organization ID in configuring the replication from the edge gateway.
With this step, the Timestream for InfluxDB instance is up and running, and you have the configuration details to set up replication from the edge gateway.
Set up replication from edge to cloud
In this section, you enable the replication task from the edge InfluxDB instance and validate the data in the cloud. Complete the steps in this section in the edge gateway:
- Launch the Influx CLI:
- Set up an Influx remote configuration (use the endpoint, API token, and organization ID you noted earlier):
- Note the
remoteID
from the output to use in the next step. - Set up replication with the following code. Provide the organization of the local InfluxDB instance, the remote ID, the API token for the edge instance, and both buckets you created earlier:
- Validate the setup with the following code:
Note the latestErrorMessage
(blank) and latestResponseCode
(204) in the output. This indicates a successful ongoing replication.
Validate replication
Let’s the query the data in the edge and the cloud to confirm both are in sync:
- Load sample data from into the InfluxDB instance on the edge. For this post, we ingest the sample data for Line 1, for temperature and vibration:
You could also use the InfluxDB publisher component if you want to write data automatically. In a production scenario, the data will flow from your sensor devices, passing through an MQTT broker like EMQX, and ingested into the InfluxDB instance.
- Run a sample query on the Influx CLI to make sure data is being written to the bucket:
You will see results similar to the following screenshot.
With this step, you have set up an edge gateway and ingested simulated data from the devices. The data that is ingested into the edge gateway InfluxDB instance is replicated into the Timestream for InfluxDB instance in near real time.
You also set up replication in a previous step, replicating from the get-started bucket on the edge to cloud-bucket.
- Run the same query on the Timestream for InfluxDB instance:
You should see the same result on the cloud side.
Simulate a temporary network failure
The biggest challenge for a hybrid cloud solution is to identify and address network connectivity issues. In this section, we mimic a network failure and demonstrate how the solution is self-healing.
- First, we disable port 8086 in the security group of the cloud instance to demonstrate a temporary network disconnect. To do so, delete the entry from the security group.
- Write a sample record into InfluxDB on the edge
- In the edge instance, you will notice the following error.
This indicates that data is not replicated to the cloud Timestream for InfluxDB instance because it’s not reachable.
- Now, add the rule back to the security group:
- The error you observed earlier should have disappeared:
The Timestream for InfluxDB instance has caught up with the edge gateway and the data is in sync. The time taken to sync depends on the amount of data, network bandwidth, and the compute power of the edge and the Timestream for InfluxDB instances.
Scale out
In the preceding setup, we replicated one edge gateway instance of InfluxDB to a Timestream for InfluxDB instance. You could repeat the setup step to host multiple edge gateways and replicate it to the same bucket in the Timestream for InfluxDB instance.
This could be useful when you want to aggregate data from multiple sites for building corporate-level dashboards, combining the data.
Clean up
To avoid recurring charges, delete the resources you created:
- Stop and terminate the EC2 instance running AWS IoT Greengrass.
- Stop and delete the Amazon Timestream for InfluxDB instance.
Conclusion
In this post, we demonstrated how you can use an InfluxDB 2.x instance on premises (edge) with edge replication in combination with a Timestream for InfluxDB instance to create a low-code solution for industrial IoT applications, enabling real-time data replication from edge devices to a cloud-based instance.
By automating data ingestion, processing, and transmission, IIoT applications can improve data freshness, reduce latency, and simplify data management. We also demonstrated how the solution is able to heal from a temporary network connectivity issue and continue the data replication process.
Additionally, you can scale this solution out to multiple sites and consolidate the data into a single Timestream for InfluxDB instance for analytics and reporting.
Try this solution out for yourself, and leave your feedback and questions in the comments. If you are already running a self-managed centralized version of InfluxDB open source and want to start using Timestream for InfluxDB, checkout our migration guide
About the Authors
Joyson Neville Lewis is a Sr. IoT Data Architect with AWS Professional Services. Joyson worked as a Software/Data engineer before diving into the Conversational AI and Industrial IoT space. He assists AWS customers to materialize their AI visions using Voice Assistant/Chatbot and IoT solutions.
Victor Servin is a Senior Product Manager for the Amazon Timestream team at AWS, bringing over 18 years of experience leading product and engineering teams in the Telco vertical. With an additional 5 years of expertise in supporting startups with Product Led Growth strategies and scalable architecture, Victor’s data-driven approach is perfectly suited to drive the adoption of analytical products like Timestream. His extensive experience and commitment to customer success allows him to help customers to efficiently achieve their goals.
Ashok Padmanabhan is a Sr. IoT Data Architect with AWS Professional Services. Ashok primarily works with Manufacturing and Automotive customers to design and build Industry 4.0 solutions.
Deepak Gopinath is a IoT Data Architect with AWS Professional Services. Deepak leverages his background in Data Analytics and IoT to collaborate with customers to design and develop unique solutions to their use cases.
Anish Kunduru is an IoT Data Architect with AWS Professional Services. Anish leverages his background in stream processing, R&D, and Industrial IoT to support AWS customers scale prototypes to production-ready software.