AWS Big Data Blog

Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more.

Ben Snively is a Solutions Architect with AWS

Speed and agility are essential with today’s analytics tools. The quicker you can get from idea to first results, the more you can experiment and innovate with your data, perform ad-hoc analysis, and drive answers to new business questions.

Serverless architectures help in this respect by taking care of the non-differentiated heavy lifting in your environment―tasks such as managing servers, clusters, and device endpoints – allowing you to focus on assembling your IoT system, analyzing data, and building meaningful reports quickly and efficiently.

In this post, I show how you can build a business intelligence capability for streaming IoT device data using AWS serverless and managed services. You can be up and running in minutes―starting small, but able to easily grow to millions of devices and billions of messages.

AWS serverless services

AWS services offer a quicker time to intelligence. The following is a high-level diagram showing how the services in this post are configured:

o_IoT_Minutes_1

AWS IoT easily and securely connects devices through the MQTT and HTTPS protocols. The IoT Rules Engine continuously processes incoming messages, enabling your devices to interact with other AWS services.

Amazon S3 is object storage that provides you a highly reliable, secure, and scalable storage for all your data, big or small. It is designed to deliver 99.999999999% durability, and scale past trillions of objects.

Amazon Kinesis Firehose allows you to capture and automatically load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon OpenSearch Service, enabling near-real-time business intelligence and the use of dashboards.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using SQL. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL, with most results delivered in seconds.

Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. You can easily run SQL queries using Athena on data stored in S3, and build business dashboards within QuickSight.

Walkthrough: Heartrate sensors

In this walkthrough, you run a script to mimic multiple sensors publishing messages on an IoT MQTT topic, with one message published every second. The events get sent to AWS IoT, where an IoT rule is configured. The IoT rule captures all messages and sends them to Firehose. From there, Firehose writes the messages in batches to objects stored in S3.  In S3, you set up a table in Athena and use QuickSight to analyze the IoT data.

Configuring Firehose to write to S3

Create a Firehose delivery stream using the following field values.

Name IoT-to-BI-Example
S3 Bucket <Create one>
S3 Prefix iot_to_bi_example/

Configuring the AWS IoT rule

Create a new AWS IoT rule with the following field values.

Name Iot_to_bi_demoRule
Attribute *
Topic Filter /health/#
Add Action Send messages to an Amazon Kinesis Firehose stream (from previous section).
Select Separator “\n (newline)”

Generating sample data

Running the heartrate.py python script generates fictitious IoT messages from multiple userid values. The IoT rule sends the message to Firehose, which writes the data out to S3.

The script assumes that it has access to AWS CLI credentials and that boto3 is installed on the machine running the script.

Download and run the following Python script:

https://github.com/awslabs/aws-big-data-blog/blob/master/aws-blog-iot-athena-quicksight-bi/heartrate.py

The script generates random data that looks like the following:

{"heartRate": 67, "userId": "Brady", "rateType": "NORMAL", "dateTime": "2016-12-18 14:01:39"}
{"heartRate": 78, "userId": "Bailey", "rateType": "NORMAL", "dateTime": "2016-12-18 14:01:41"}
{"heartRate": 94, "userId": "Beatrice", "rateType": "NORMAL", "dateTime": "2016-12-18 14:01:42"}
…
{"heartRate": 73, "userId": "Ben", "rateType": "NORMAL", "dateTime": "2016-12-18 14:01:54"}

Run the script for a few hours and then end the task or process.

Configuring Athena

Log into the Athena console. In the Query Editor, create a table using the following query:

CREATE EXTERNAL TABLE heartrate_iot_data (
    heartRate int,
    userId string,
    rateType string,
    dateTime timestamp)
ROW FORMAT  serde 'org.apache.hive.hcatalog.data.JsonSerDe'
with serdeproperties( 'ignore.malformed.json' = 'true' )
LOCATION 's3://<CREATED-BUCKET>/iot_to_bi_example/'

After the query completes, you see the new table in the left-hand pane:

o_IoT_Minutes_2

Run the following command to see the different userid values that were generated by the test script:

SELECT userid, COUNT(userid) FROM heartrate_iot_data GROUP BY userid

Your counts depend on how long you left the script to run, but you should have the following names listed:

o_IoT_Minutes_3

Analyzing the data

Now, analyze this data using Athena via QuickSight.

Set up a data source

Log into QuickSight and choose Manage data, New data set. Choose Athena as a new data source.

o_IoT_Minutes_4

Name your data source “AthenaDataSource”. Select the default schema and the heartrate_iot_data table.

o_IoT_Minutes_5

On the left, choose Visualize.

NOTE: The Edit/Preview data button allows you to prepare the dataset differently prior to visualizing it. Examples include being able to format and transform your data, create alias data fields, change data types, and filter or join your data.

Build an analysis

On the left, choose userid; you’ll see the same names and counts that you saw when you ran the Athena SQL query earlier.

o_IoT_Minutes_6

Choose ratetype to query over the IoT data for the number of userids that have had HIGH versus NORMAL heartrates.

o_IoT_Minutes_7

Focus on “HIGH” heart rates by interacting with the graph. Open one of the bars under the HIGH section and choose Focus only on HIGH.

o_IoT_Minutes_8

This configures a filter to include only “HIGH” records.

o_IoT_Minutes_9

On the left, choose Visualize. Clear ratetype (as you are now filtering by it), choose heartrate, and switch to AVERAGE instead of SUM.

IoT_Minutes_10

Selecting the datetime X axis title allows you to change the time units easily. Change it to an hour.

You can easily change the parameters and see the users who haven’t had any HIGH heartrate data (in this dataset: Bonny, Brady, and Branden). 

o_IoT_Minutes_11

Conclusion

With these few easy steps, you’ve assembled a fully managed set of services for collecting sensor data from devices, batching the data into S3, and building intelligence from that data using QuickSight and Athena.

This was all done without launching a single server or cluster; throughout, you’ve been able to focus on the data and analytics, rather than the management of the infrastructure.

In this example, you kept the data in JSON. Converting it to Parquet would have been much more performant. Setting up partitions for the data would also restrict the amount of data scanned.

If you have questions or suggestions, please comment below.


About the Author

ben_snively_90Ben Snively is a Public Sector Specialist Solutions Architect. He works with government, non-profit and education customers on big data and analytical projects, helping them build solutions using AWS. In his spare time he adds IoT sensors throughout his house and runs analytics on it.


Related

Integrating IoT Events into Your Analytic Platform

iot_1_smlal