Extract, Transform, Load with AWS IoT Greengrass Solution Accelerator
Background
Data acquisition is a primary catalyst for generating outcome-oriented insights based on IoT solutions, whether it's telemetry data generated by a few devices in a smart home setting or sensor-based data streaming off thousands of industrial devices. IoT applications require flexible tools that can accommodate effective data management at or near the source to collect and ingest data. Data pipelines and IoT tools running at the edge also need the ability to account for resource constraints and intermittent or narrow bandwidth network connectivity.
The Extract, Transform, Load (ETL) with AWS IoT Greengrass solution accelerator enables developers to quickly build, test, and develop IoT solutions that include ETL functions.
Extract, transform, load using AWS IoT Greengrass
Version: 1.0
Last updated: 10/2019
Author: AWS
Overview
You can use this solution accelerator to quickly setup an edge device with AWS IoT Greengrass to perform extract, transform, and load functions on data gathered from local devices, before being sent to AWS. The solution accelerator breaks the three steps of ETL operations into discrete processes, decoupled using persistent data queues. Each AWS Lambda process places and/or retrieves messages from the relevant data queues.
Use case
Common use cases for ETL at the edge include:
- Receiving data from different protocols such as LoRaWAN, OPC-UA, or bespoke systems, and converting into standard data formats for use in Cloud workloads
- Culling high-frequency data, performing averaging functions -and- on large data sets, and then loading averaged or filtered values at a reduced rate
- Calculating values from disparate data sources on the local device, and sending filtered values to the cloud back end
- Cleansing, de-duplication, or filling missing time series data elements
Architecture
To read the vehicle data, an ODB-II data logger is connected to the vehicle and queries the data ten times per second using IoT Greengrass. Every message read is converted to JSON and also stored into a time series database. The content of the messages is transformed from the 8-byte OBD-II values into a readable JSON formatted message.
Finally, individual records are stored as JSON records in Amazon S3 via Amazon Kinesis Data Firehose. Every thirty seconds, the minimum, average, and maximum values of parameters of interest are published to AWS IoT Core via MQTT.
Features
Customizable
This solution accelerator is hardware agnostic and can be customized to deploy and run on most any hardware running AWS IoT Greengrass. The AWS Partner Device Catalog provides a list of qualified devices that have been tested to run IoT Greengrass and interoperate with AWS. You can self-test your hardware to run AWS IoT Greengrass with AWS IoT Device Tester.