Autonomous vehicle data collection with AWS Snowcone and AWS IoT Greengrass

Self-driving and self-flying vehicles — autonomous cars, airplanes, and drones — require vast amounts of data to fulfill their promise of a safe mode of transportation for goods and people. Connected vehicles and the Internet of Things (IoT) have a strong influence on the way we collect and process low-bandwidth telemetry data, in addition to high-bandwidth sensor data for autonomy. Telemetry data must be ingested, analyzed, and acted upon in near-real time within seconds of the event happening. In contrast, cameras, lidars, and radars together can produce tens and hundreds of terabytes of data per hour, typically used for offline processing and machine learning.

Common system architecture patterns for vehicles, airborne and ground-based alike, differentiate between safety-critical and non-safety-critical workloads. Autopilot, engine control, and fuel management are common safety-critical tasks, which have hard real-time requirements. Collecting and storing high-bandwidth sensor data can be classified as non-critical when other systems do not depend on the availability of these data streams during operation.

In this blog post, I describe an autonomous vehicle, such as a quadcopter or helicopter, with a split-responsibility system. The first, an avionics system, is in charge of safety-critical tasks that validates the correct operation of the vehicle within its design envelope. The second system (outside of the critical path) sends telemetry for real-time tracking and bulk data collection for research and development purposes. I demonstrate using AWS Snowcone in place of this second system to tap into a vehicle’s sensor stream, transform real-time information into telemetry events, and persist all data for later ingestion into Amazon S3. Together with AWS IoT and AWS IoT Greengrass, we provide an edge computing environment with over-the-air update capabilities. Offloading these non-safety-critical tasks to a dedicated AWS Snowcone device allows the vehicle to operate securely and safely, while also providing advanced data collection and processing functionality for real-time fleet management and offline data analysis on vast amounts of high-bandwidth sensor data.

Walkthrough

First, I show how to configure a Snowcone to start and bootstrap a local Amazon EC2 instance, and how to provision this instance to run AWS IoT Greengrass 2.0 for edge computing tasks. Together with AWS IoT Core and a custom AWS IoT Greengrass component, I demonstrate how to send real-time vehicle location and health tracking telemetry events to the cloud, and how to ingest high-bandwidth sensor data into a Snowcone for offline transfer to an S3 bucket.

The following architecture diagram depicts the individual parts of the solution:

architecture

This blog post includes a demo application packaged as an AWS IoT Greengrass component: AvionicsDataCollection is a Python-based application that demonstrates how to read avionics state information and generates a telemetry event with live location data and engine metrics. These events are periodically sent to AWS IoT Core. Camera data, or any other high-bandwidth sensor data streams, are queried periodically and persisted to the internal Snowcone storage attached via NFS. This data will later be ingested into Amazon S3 when the device is shipped back to AWS. Your vehicle, sensors, and avionics system will have dedicated interfaces to extract information in a non-safety-critical environment.

I walk through the following main steps of this solution:

Configure IAM, security, and permissions-related configuration for AWS IoT resources in your AWS account.
Configuring the avionics system before first flight.
Automate provisioning of a local EC2 instance on a Snowcone device.
Install and provision AWS IoT Greengrass inside a local EC2 instance.
Deploy a new AWS IoT Greengrass component over-the-air with a demo application.

Once these steps are done, you will have a Snowcone device with AWS IoT Greengrass running inside an EC2 instance. The AWS IoT Greengrass core device automatically deployed the AvionicsDataCollection application, which sends periodic telemetry events to AWS IoT Core. It also writes high-bandwidth data into the Snowcone for later ingestion into Amazon S3. All steps are fully automated and require no human interaction after the initial configuration.

Prerequisites and requirements

For this demonstration, you need the following prerequisites:

An AWS account
Valid IAM credentials for AWS IoT access
An AWS Snowcone with local compute
- with your customized EC2 AMI, based on Ubuntu 16.04
- with an S3 bucket for importing data
- see previously published blog post (steps 1 and 2) on how to prepare the AMI and order the AWS Snowcone
Optional: an autonomous drone or vehicle to collect telemetry and sensor data

The following tools are needed on the avionics system to bootstrap and provision the full solution:

All referenced scripts are available in the GitHub repository for this blog post. For an in-depth explanation of the individual steps for using EC2 instance on AWS Snowcone, refer to this blog post series.

Configuring AWS IoT Core and AWS IoT Greengrass

Security is an important aspect of every IT infrastructure — for cloud infrastructure and for edge devices. With AWS IoT services, you have full control over devices and their permissions. In this blog post, we make use of the included provisioning step of the AWS IoT Greengrass installer. In a production environment, we recommend using other mechanisms, such as fleet provisioning or just-in-time provisioning, to fully secure the device enrollment process. When using the AWS IoT Greengrass installer, we perform the following steps:

Create a new IAM user: “GreengrassV2Provisioner”.
Attach a new policy with limited access (principle of least privilege).
Create access keys for this user, which you will use during provisioning in the next section.

This allows the AWS IoT Greengrass installer to create a new thing, attach a certificate, and enable the thing to communicate with AWS IoT Core. Once the thing can communicate with the cloud, we use an IAM role with an IoT role alias to interact with other AWS services. AWS IoT Greengrass automatically assumes this role to request valid credentials. For this blog post, I create the recommended GreengrassV2TokenExchangeRole role and attach the default GreengrassV2TokenExchangeRoleAccess role. Since we store the AvionicsDataCollection application in an S3 bucket, we also attach the AWS managed AmazonS3ReadOnlyAccess policies to allow the download of AWS IoT Greengrass component artifacts. In a production environment, this can be further restricted to a single S3 bucket or prefix. Only complete these steps once for your fleet of autonomous drones.

Lastly, we create a thing group with AWS IoT in our AWS account to group our devices. This allows us to deploy application updates and monitor all vehicles with a single source of truth (see last section of this blog post). A thing group is simply defined by a name:

aws iot create-thing-group --thing-group-name FlyThings

Refer to the bootstrap-greengrass-provisioner.sh script for a full set of AWS CLI instructions.

We have now finished the security foundation with IAM users, roles, and policies. We will use it in the next steps to deploy our AWS IoT Greengrass resources.

Configuring the avionics system to provision the AWS Snowcone

This step covers the configuration of the avionics system, in addition to the necessary tasks for a boot-up sequence of your vehicle with a Snowcone device installed. We are aiming for a highly automated solution. Therefore, the system architecture assumes that the avionics system can run the provisioning commands during the vehicle start-up, while it is in a safe mode of operation (parked, on the ground). After provisioning, there is no need for further command execution on the avionics system. It can now switch into its safety-critical mode (ready to fly), and hand-off all data collection and processing to the Snowcone. A human operator must still plug the Snowcone into the vehicle, and connect it with the on-board network.

We define the following configuration parameters to customize all component to your AWS environment and physical network layout, and set them as environment variables in the avionics system:

General parameters

AWS_REGION="eu-central-1"
SNOW_AMI_NAME="MyAMIName" # customized AMI selected during the Snowcone Job creation
SNOW_EC2_IP="192.168.1.221" # available IP from vehicle network
SNOW_NFS_IP="192.168.1.222" # available IP from vehicle network
SNOW_NFS_ALLOWED="192.168.1.0/24" # vehicle network CIDR
SNOW_NETMASK="255.255.255.0" # vehicle network netmask
THING_GROUP="FlyThings" # used for Greengrass deployments

To demonstrate the Greengrass installer, we will also include the access keys created for the “GreengrassV2Provisioner” IAM user in an earlier step.

Vehicle and Snowcone-unique parameters

THING_NAME="FlyThing_001" # derive from avionics system + Snowcone Job ID
SNOW_IP="192.168.1.<snowcone-ip>" # get from vehicle network
SNOW_MANIFEST="/path/to/manifest.bin" # download from the AWS Console
SNOW_UNLOCK_CODE="<uuid-unlock-code>" # get from the AWS Console

Provisioning AWS Snowcone for local compute with EC2 and NFS

Before physically installing and powering on a Snow device, we recommend to check it for damage and obvious tampering. Since avionics systems are typically headless compute units without a graphical user interface, we use the Snowball Edge client to interact with our device. For more details on the command walk-through, please see this previously published blog post (section: “EC2 instance on AWS Snowcone”).

We perform the following steps with corresponding Snowball Edge CLI and AWS CLI commands:

Unlock and wait for the Snowcone device to be unlocked and read for use (unlock-device and describe-device commands).
Configure access to the local EC2 endpoint on the Snowcone.
- Get the local EC2 endpoint access keys (list-access-keys and get-secret-access-key commands).
- Store these access keys in a new AWS CLI profile (AWS CLI: aws --profile snow configure).
Configure NFS service on the Snowcone.
- Create or reuse a Virtual Network Interface for NFS (create-virtual-network-interface and describe-virtual-network-interfaces commands).
- Start the NFS service with the created VNI (start-service command).
Configure an EC2 local compute instance on the Snowcone.
- Check and launch instance if one already exists from previous boot-up (AWS CLI: aws ec2 describe-instances and start-instances).
- Run a new EC2 instance using your customized AMI (AWS CLI: aws ec2 run-instances).
- Create or reuse a Virtual Network Interface for this instance (create-virtual-network-interface and describe-virtual-network-interfaces commands).
- Wait for the EC2 instance to boot up and transition into “running” state.
- (optional) Make the instance accessible from the vehicle network (AWS CLI: aws ec2 associate-address).

After starting the EC2 instance on your Snowcone, the AMI is configured to run bootstrapping commands from the initialization script (supplied user data). I review the script in the next section.

See the bootstrap-snowcone.sh script for a full set of Snowball Edge Client and AWS CLI instructions.

Installing AWS IoT Greengrass inside an EC2 instance

During the provisioning of the EC2 instance, we included bootstrapping commands after the instance starts.

We use an Ubuntu 16.04-based AMI and install packages to connect to the NFS share, and launch the Java-based AWS IoT Greengrass nucleus. Include this when preparing the AMI before creating the Snowcone job:

apt-get --quiet update
apt-get --yes install --no-install-recommends nfs-common default-jdk-headless

Next, configure and mount the NFS share:

mkdir -p /snowcone_nfs
echo "${SNOW_NFS_IP}:/buckets /snowcone_nfs nfs defaults 0 0" | tee -a /etc/fstab > /dev/null
mount --all

And finally, we download and run the Greengrass installer. We use the included Greengrass provisioning feature for an easy and quick installation. This uses the previously generated “GreengrassV2Provisioner” user and access keys to create the AWS IoT thing, a new certificate and private key, a policy, and the necessary configuration for the Greengrass core to start up automatically, and listen for over-the-air updates and commands. For large-scale and production environments, we recommend using other mechanisms, such as fleet provisioning or just-in-time provisioning, to fully secure the device enrollment process.

We invoke the AWS IoT Greengrass installer with our previously defined parameters:

# Download AWS IoT Greengrass Core v2 and install dependencies
mkdir -p /greengrass
curl -s https://d2s8p88vqu9w66.cloudfront.net/releases/greengrass-nucleus-latest.zip > /greengrass/greengrass-nucleus-latest.zip
unzip -o /greengrass/greengrass-nucleus-latest.zip -d /greengrass/GreengrassCore
rm -f /greengrass/greengrass-nucleus-latest.zip

# Install AWS IoT Greengrass Core v2, provision thing, and enable auto-start
java \
  -Dlog.store=FILE \
  -Droot="/greengrass/v2" \
  -jar /greengrass/GreengrassCore/lib/Greengrass.jar \
  --aws-region "${AWS_REGION}" \
  --thing-name "${THING_NAME}" \
  --thing-group-name "${THING_GROUP}" \
  --tes-role-name GreengrassV2TokenExchangeRole \
  --tes-role-alias-name GreengrassV2TokenExchangeRoleAlias \
  --component-default-user ggc_user:ggc_group \
  --provision true \
  --setup-system-service true

You can download the Greengrass package while creating your base AMI, and then use the over-the-air update capabilities to deploy the latest version.

The --provision flag creates a new thing, joins it to the specified thing group, and enables the token exchange service (TES) to establish secure interactions with AWS services. The installer creates a local system user and group ggc_user:ggc_group for running components with minimal permissions. The --setup-system-service flag installs, enables, and starts a new systemd service unit for AWS IoT Greengrass.

Refer to the bootstrap-flything.sh script for a full set of commands. This script is used directly by the bootstrap-snowcone.sh script as user data when creating a fresh EC2 instance running on a Snowcone.

Now we have a fully working AWS IoT Greengrass device that listens for remote deployments of new components and over-the-air updates.

Deploying custom AWS IoT Greengrass components

AWS IoT Greengrass 2.0 introduced components as self-contained software modules, which you can configure and deploy to your Greengrass core devices. We use the sample component AvionicsDataCollection as previously introduced.

The avionics system is available on the local network and can be queried via an HTTP endpoint. While this works for most embedded compute systems, camera or radar sensors only provide streaming data, typically as multicast groups on the local network. With the recently launched direct network interfaces for Snowcone devices, you can configure layer 2 network access and tap into such multicast streams from within your EC2 instances.

The Python-based application uses the AWS IoT Device SDK for interprocess communication (IPC). This allows us to write application code without needing to open another secure communication channel to AWS IoT Core. AWS IoT Greengrass automatically forwards our MQTT messages based on the access control definition of the component recipe:

"ComponentConfiguration": {
  "DefaultConfiguration": {
    "accessControl": {
      "aws.greengrass.ipc.mqttproxy": {
        "AvionicsDataCollection:mqttproxy:1": {
          "policyDescription": "Allows publishing to the avionics telemetry topic.",
          "operations": [
            "aws.greengrass#PublishToIoTCore"
          ],
          "resources": [
            "flythings/+/avionics/telemetry"
          ]
        }
      }
    }
  }
}

The component files (Python source code and installation scripts) are stored in a private S3 bucket. The Greengrass core device is allowed to access them via the GreengrassV2TokenExchangeRole IAM role.

To support a fleet of devices, these files should be versioned, and each deployed version uploaded to a dedicated folder, for example, with a prefix s3://EXAMPLE-AVIONICS-DATA-COLLECTION/AvionicsDataCollection/v1.0.0/:

cd path/to/source/code/
aws s3 sync . s3://EXAMPLE-AVIONICS-DATA-COLLECTION/AvionicsDataCollection/v1.0.0/

Each component is defined with a YAML or JSON-based component recipe. The recipe contains instructions for the component lifecycle steps, default configuration, and artifacts (source code files in S3). Refer to the component-AvionicsDataCollection.json recipe file for the full content.

After uploading, create a new component version and a new deployment with that component:

aws greengrassv2 create-component-version \
  --inline-recipe fileb://component-AvionicsDataCollection.json

aws greengrassv2 create-deployment \
  --target-arn "arn:aws:iot:${AWS_REGION}:${AWS_ACCOUNT_ID}:thinggroup/FlyThings" \
  --deployment-name "FlyThingsDeployment" \
  --components AvionicsDataCollection={componentVersion=$(jq -r '.ComponentVersion' component-AvionicsDataCollection.json)}

This triggers an over-the-air update to all Greengrass core devices part of the FlyThings thing group. You can quickly roll-back to a previous component version, by simply creating a new deployment with the appropriate component version number.

Please see the bootstrap-greengrass-deployment.sh script for a full set of AWS CLI instructions.

You have deployed your first AWS IoT Greengrass component into the Greengrass core device. The deployed AvionicsDataCollection application is processing incoming data and sending telemetry events to AWS IoT Core. You can subscribe to the flythings/+/avionics/telemetry topic to monitor these messages from all things (represented by the + topic wildcard) in your AWS Management Console, under the AWS IoT – Test section. The high-bandwidth data is stored on the Snowcone’s NFS share and ingested into Amazon S3 after shipping the device back to AWS.

Cleaning up

To avoid incurring future charges, delete the AWS IoT Things, the AWS IoT Greengrass core device, as well as the associated IoT certificate and policy. To keep your environment secure, delete the IAM user and policy used for Greengrass provisioning. You can follow the shipping instructions to return the Snowcone device to an AWS facility.

Conclusion

In this blog post, I demonstrated data processing and data collection from autonomous vehicles, using an example of an autonomous drone. I introduced a split-responsibility architecture of the safety-critical avionics system, and an AWS Snowcone as a non-safety-critical edge computing and storage platform. I displayed bootstrapping and provisioning of a Snowcone device during the vehicle boot-up sequence, to create a local EC2 instance with AWS IoT Greengrass running inside to host the demo AvionicsDataCollection application as an AWS IoT Greengrass component.

I also showed how to configure and deploy AWS IoT Greengrass and the component to make use of the secure IPC feature to publish MQTT messages into AWS IoT Core. This architecture uses the NFS service on Snowcone to store high-bandwidth data streams from the vehicle’s sensors, and ingests them offline into Amazon S3 for machine learning and data analytics use cases. This enables customers to decouple telemetry data being sent over-the-air for live location and health tracking, from the bulk data collection of cameras, radar, and lidar sensors.

AWS Snowcone is an ideal edge processing device for autonomous vehicle use as a drop-in solution for non-safety-critical compute tasks for telemetry transmission and data collection. The ruggedized and secure device is a perfect fit for portable and modular vehicle designs. Specifically, it provides flexible compute capabilities and can host AWS IoT Greengrass to integrate with vehicle systems without being on the critical path of the avionics system. With custom AWS IoT Greengrass components, customers can deploy and manage the lifecycle of their applications at the edge with over-the-air updates, leading to faster development cycles and full visibility into the fleet status and health.

Thanks for reading this blog post on using AWS Snowcone and AWS IoT Greengrass for autonomous vehicle data collection. Please don’t hesitate to leave comments or questions in the comments section, or create new issues and pull requests in the GitHub repository.

You can try out the AvionicsDataCollection application yourself: order an AWS Snowcone device and start collecting data!

To learn more about how you can use the services used in this blog post in automotive and connected vehicle use cases, refer to these pages:

AWS Storage Blog