AWS for Industries
How CPGs and Logistics Vendors Can Optimize Fleet Management with an AWS Data Lake
In part one of this series of blog posts, Why CPGs and Logistics Vendors Need a Fleet Management Data Lake, we explained why CPGs and logistics vendors need a data lake for transportation and fleet management. To quickly recap, a data lake is an ideal way to store, manage, and analyze transportation and fleet data to optimize things like vehicle maintenance, driver safety, delivery routes, and retailer/customer satisfaction, and to reduce costs and emissions.
In this follow-up post, we’ll discuss the architecture of an AWS data lake that’s scalable across different data types and formats to help you easily analyze data and optimize your transportation fleet.
Steps to Create, Manage, and Analyze Data in an AWS Data Lake
Data lake architectures:
- Ingest data from multiple sources, including historical data from company systems, information from siloed databases, real-time streaming data, mobile app data, and data from other sources. Automatically extract, transform, and load raw data with AWS Lake Formation, a service that makes it easy to set up a secure data lake.
- Clean and organize data. To extract meaningful insights, the collected data must be cleaned and organized. This involves deduplication of records, matching data attributes from various sources for joining, and partitioning for cost and performance optimization.
- AWS Lake Formation helps to automate this otherwise manual effort. It helps to deduplicate and find matching records for analysis through machine learning transforms.
- AWS Lake Formation can also help automate transformations on your data through transformation templates. It can schedule jobs to prepare your data for analysis through AWS Glue. For better performance, it helps to convert data to columnar formats, such as Parquet and ORC. Less data is read for analysis when it is organized into columns rather than rows.
- Catalog and store data in a data lake. You must catalog the ingested data to make it queryable. The AWS Glue Data Catalog is an index of the location, schema, and runtime metrics of your data. AWS Glue Crawler can crawl multiple data stores in a single run and populate the catalog with tables for querying.
- Turn data into actionable knowledge for analytics. You can provide users with secure, self-service access to data through analytics services like Amazon QuickSight, Amazon Redshift, and Amazon Athena. With Amazon QuickSight, you can combine data from multiple data sources, like Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), on-premises data sources, and AWS IoT Analytics, to produce dashboards. This makes it easy for users to gain insights to better manage transportation and vehicles. With Amazon Redshift, you can query exabytes of structured and semi-structured data using standard SQL. This scalability is very helpful when you’re dealing with huge amounts of streaming data from different vehicles and third-party sources.
The following image shows the flow of fleet management information into a data lake:
This more detailed reference architecture is designed for commercial vehicles that transport goods. We’ll break down this architecture in the following use cases to explain how you can use the data lake to gain transportation and fleet management insights.
Real-Time Weather and Traffic Data
Amazon Kinesis Data Streams (Amazon KDS) is a massively scalable and durable real-time data streaming service. It continuously captures gigabytes of data per second from hundreds of thousands of sources, such as traffic patterns and weather conditions in a city, region, country, or around the globe.
Amazon Kinesis Data Analytics allows you to query streaming data in real time with SQL. There are no servers to manage. It scales automatically to match the volume and throughput of your incoming data. As weather conditions change (for example, if heavy rains are approaching), you can use Amazon Simple Notification Service (Amazon SNS) to send SMS alerts to warn drivers to change routes to avoid bad weather.
Vehicle Telemetry Data
Vehicle telemetry provides information about a vehicle or driver’s performance by collecting data from sensors in the vehicle like the Global Positioning System (GPS), advanced driver-assistance systems (ADAS), speed, breaking, lane changes, and road conditions. AWS IoT Greengrass is an open-source edge runtime solution that helps you program your vehicle devices to act locally on the data they generate, execute predictions, filter and aggregate device data, and transmit the filtered information to the fleet management data lake. This information allows you to proactively manage vehicle maintenance.
Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into a data lake. In this case, we use AWS IoT Core as the data source. AWS IoT Core connects the IoT devices to the AWS Cloud and it can connect devices to each other, too. This inter-device connectivity can be helpful if a vehicle breaks down and you need to alert other drivers to help deliver the goods on the disabled vehicle. In the fleet management data lake architecture, AWS IoT Core connects to Amazon S3 through Amazon Kinesis Data Firehose.
Driver Safety Data
In addition to the aggregation of vehicle telemetry, it’s important to capture driving patterns to measure drivers’ performance based on safety.
Amazon DynamoDB is a fast, flexible NoSQL key-value and document database service for low-latency data access at any scale. In the following image, AWS IoT Core rules trigger AWS Lambda actions, which insert records in Amazon DynamoDB tables.
For example, a rule can dictate that a speeding driver will trigger an IoT action to AWS Lambda, which inserts the record in Amazon DynamoDB tables along with supporting data like the driver’s name, location, and vehicle. You can use Amazon SNS to send an alert to the driver’s manager.
This architecture uses the AWS IoT rules engine and its seamless integration with AWS Lambda to create a scalable IoT backend with Amazon DynamoDB.
Personnel and Vehicle Data
The drivers’ employee data like personal details and license information and vehicle-specific data like make, model, warranty and maintenance schedule are data that fit the relational format and are persisted in .
Athena Federated Query in Amazon Athena comes in handy when you need to query relational, nonrelational, and object (Amazon S3) data across multiple datastores. Amazon Athena is a serverless interactive query service that allows you to easily analyze data in Amazon S3 using standard SQL.
In this architecture, the driver data is split across three different datastores:
- Personal data in Amazon RDS.
- Driving anomalies in Amazon DynamoDB.
- Weather data in Amazon S3.
Let’s say, for example, that you need a list of drivers who’ve missed the expected time of arrival (ETA) delivery window in the past month. Here, we’re joining the three data sources with connectors to Amazon DynamoDB and Amazon RDS tables under Amazon Athena.
The following image shows how Athena Federated Query can cross datastores:
From the results, we see that Bern, David, and Marshall missed their ETAs due to weather conditions, but Sarah’s data needs some investigation. Win is clearly not a safe driver, because not only did he exceed the speed limit, he did so during bad weather and traffic conditions.
In a future post, we’ll explore other considerations for building a robust, scalable fleet management data lake.
If you’re interested in learning about a use case for AWS ML models, check out the How to Predict Shipments’ Time of Delivery with Cloud-based Machine Learning Models blog post. And if you’re ready to move forward with a fleet management data lake, contact your AWS account team today to get started.