Extract, Transform, Load with AWS IoT Greengrass Solution Accelerator

Background

Data acquisition is a primary catalyst for generating outcome-oriented insights based on IoT solutions, whether it's telemetry data generated by a few devices in a smart home setting or sensor-based data streaming off thousands of industrial devices. IoT applications require flexible tools that can accommodate effective data management at or near the source to collect and ingest data. Data pipelines and IoT tools running at the edge also need the ability to account for resource constraints and intermittent or narrow bandwidth network connectivity.

The Extract, Transform, Load (ETL) with AWS IoT Greengrass solution accelerator enables developers to quickly build, test, and develop IoT solutions that include ETL functions.

Get solution accelerator »

Extract, transform, load using AWS IoT Greengrass

Version: 1.0
Last updated: 10/2019
Author: AWS 

Get Solution accelerator »

Overview

You can use this solution accelerator to quickly setup an edge device with AWS IoT Greengrass to perform extract, transform, and load functions on data gathered from local devices, before being sent to AWS. The solution accelerator breaks the three steps of ETL operations into discrete processes, decoupled using persistent data queues. Each AWS Lambda process places and/or retrieves messages from the relevant data queues.

Use case

Common use cases for ETL at the edge include:

  • Receiving data from different protocols such as LoRaWAN, OPC-UA, or bespoke systems, and converting into standard data formats for use in Cloud workloads
  • Culling high-frequency data, performing averaging functions -and- on large data sets, and then loading averaged or filtered values at a reduced rate
  • Calculating values from disparate data sources on the local device, and sending filtered values to the cloud back end
  • Cleansing, de-duplication, or filling missing time series data elements
 
For example, automobiles can provide a variety of interesting data from CAN bus networks, including vehicle speed, coolant temperature, diagnostic trouble codes (“check engine” light), and vehicle-specific codes. This information is published hundreds of times per second, and can give insights into vehicle operations.
Imagine you want to collect data from a vehicle multiple times per second, transform and compute the data into a digestible format, and then store the messages in the cloud (when an Internet connection is available). As the vehicle travels into areas without Internet coverage, you want to ensure the local data collection operation continues and messages are preserved until they are stored in the cloud once coverage is restored.
 
Within the car, you can extract hundreds of parameter values per-second using the native CAN bus format. With the extracted messages, you use AWS IoT Greengrass and Lambda to perform transformation functions to look for faults in near real time and create vehicle statistics for analytics operations performed in the cloud. Finally, we load these messages into different AWS services for varying applications, such as MQTT for immediate alerts and periodic aggregations or Amazon Kinesis Data Firehouse for higher velocity data. If there is a lack of connectivity, the messages destined for the cloud are persisted locally, then flushed when connectivity is re-established.

Architecture

To read the vehicle data, an ODB-II data logger is connected to the vehicle and queries the data ten times per second using IoT Greengrass. Every message read is converted to JSON and also stored into a time series database. The content of the messages is transformed from the 8-byte OBD-II values into a readable JSON formatted message.

Finally, individual records are stored as JSON records in Amazon S3 via Amazon Kinesis Data Firehose. Every thirty seconds, the minimum, average, and maximum values of parameters of interest are published to AWS IoT Core via MQTT.

Features

Customizable

This solution accelerator is hardware agnostic and can be customized to deploy and run on most any hardware running AWS IoT Greengrass. The AWS Partner Device Catalog provides a list of qualified devices that have been tested to run IoT Greengrass and interoperate with AWS. You can self-test your hardware to run AWS IoT Greengrass with AWS IoT Device Tester.