Analyzing the Run-Time Behavior of an AWS IoT Device

By Richard Elberger, Partner Solutions Architect at AWS
By Johannes Lask, Software Developer and Product Manager at SEGGER

SEGGER Microcontroller is a full-range supplier of software, hardware, and development tools for embedded systems. They offer support throughout the development process with high quality, flexible, and easy-to-use tools and components.

SEGGER is an AWS Partner Network (APN) Standard Technology Partner and provides solutions for secure communication as well as data and product security, meeting the needs of the rapidly evolving Internet of Things (IoT) market.

In this post, you will see what your IoT devices really do when they connect to and use AWS IoT. We demonstrate how to visualize and analyze what happens in your firmware—from a button press to posting a status update to the Message Broker. With SEGGER’s analysis tools, we show how to track down and solve hard-to-find, timing-related issues.

Overview

IoT projects often work with external stimuli, such as sensor data or user input, and communicate with other “things.” The key to a successful project is to ensure and verify the system correctly handles all input and sends the right messages.

During planning and design, you have an idea how your system operates and what it should do, and then develop it accordingly. You’ve likely thought about what kind of input (sensor data, user button, network messages) the system could get and how each of them should be handled. This design defines the contexts, task, interrupts, and resources of your system firmware.

Once your first firmware is ready, you can test it on real hardware. These tests might show a behavior that puzzles you. The system might not react on a button press, send wrong sensor data, or even crash, although this should not happen according to your design idea. This is where analysis of the run-time behavior of your IoT device can help.

Project Design

The project we used for this example is a light on-off toggle button that publishes a message to a Message Queuing Telemetry Transport (MQTT) topic each time the button is pressed. Lights may subscribe to the topic to react to messages and turn on or off.

The software design uses a:FreeRTOS and is described by these parts:

Startup (Main): Initialize the hardware, configure the button, create the Main Task, connect to Wi-Fi, and start the RTOS Scheduler.
Main Task: In an endless loop, wait in blocked state (so not consuming any CPU time) for an event from the button press and publish message.
MQTT System Task: Handle connection and communication with the MQTT Message Broker, publish messages received from tasks, and manage subscriptions.
Button Interrupt Handler: On the button, press send event to Main Task.

Download a complete reference project for the STM32L4 Discovery IoT Node >>

System Analysis

After we set up our project and developed the application code, we wanted to test and run the software on our hardware to verify it behaved according to our design idea. The simplest way to see what the device does and to get run-time information is printf debugging. To do this, add status messages to your application at any point that may be important to record, and then print them to a terminal via serial port, for example.

For this project, we wanted a more feasible approach to record system activity, so we used an application-independent software instrumentation solution.

Software Instrumented System Analysis

Software instrumentation for system analysis is similar to common printf debugging, as code must be added to certain points of interest. We want to use an application-independent software instrumentation for our analysis. Application-independent refers to common, reusable software modules, such as the RTOS and middleware modules that need to be instrumented.

We do not have to add additional analysis code to our system’s application sources. This enables reusability in multiple projects without additional effort and changes in the application code. The instrumentation might be done by the RTOS/middleware module developers, or by the project developers.

Application-independent instrumentation solutions are usually created to be lightweight and add the least overhead to the system as possible. Instead of printing long, formatted debug messages, only the necessary information may be recorded, which can be analyzed on the development host.

IoT Run-Time-1

Figure 1 – Application-independent software instrumentation.

SystemView

In our project, we use the free software instrumentation SystemView from SEGGER. SystemView instruments the RTOS and interrupts and records events to visualize the behavior of a system. SystemView is RTOS independent, and the instrumentation code is already available for a:FreeRTOS.

The events that are recorded by SystemView can be read and transferred with a J-Link debug probe to the development host while the target device is running. SystemView and J-Link work together to transfer event data over the existing debug interface (JTAG or SWD) without additional hardware or software, and without requiring large, dedicated buffers on the device.

Project Setup

Our light toggle button project is based on the predefined a:FreeRTOS Configuration “Connect to AWS IoT – ST,” but instead of the System Workbench for STM32 we use Embedded Studio as our Integrated Development Environment (IDE).

Here’s how to get your source code project organized for the tracing and profiling we cover in this post:

Log into the AWS IoT Console and select Software from the left menu
Click “Configure Download” in the Amazon FreeRTOS Device Software box
Download the “Connect to AWS IoT – ST” configuration
Extract the sources on your disk
Open Embedded Studio and create a new project for the STM32L4
Add the source files to your new project
In the project options, set the include directories and preprocessor defines

You can download the ready-to-run project for Embedded Studio to get started immediately.

Follow the a:FreeRTOS Getting Started Guide to configure your device and project to connect to AWS IoT.

Adding SystemView to the Project

SystemView consists of two parts—the host application we use to analyze the system, and a small software module. The sources of the module need to be added to your project and it must be enabled by our application.

Here’s how to add SystemView to your project:

Download and install SystemView
Browse to the installation directory (e.g. C:\Program Files (x86)\SEGGER\SystemView_V252a)
Copy the target sources from “Src” into your project directory
Add the sources from “SEGGER” and “Sample/FreeRTOSV10” to your project
Add the include directories in your project configuration.

The a:FreeRTOS instrumentation requires some modifications that are available as an easy to apply patch file. Download the FreeRTOS v10 kernel patch for SystemView, and navigate to “Sample/FreeRTOSV10/Patch” on the filesystem in the SystemView installation directory. Apply the FreeRTOS_V10_Core.patch file to the FreeRTOS sources in your project.

You can also use the modified code from the reference project in your project to skip the modification steps.

To record system events with SystemView, we need to configure the trace macros in a:FreeRTOS to call the matching SystemView functions. The macro definitions are already available in the target sources. Just add “#include SEGGER_SYSVIEW_FreeRTOS.h” at the end of your FreeRTOSConfig.h.

We also need to enable recording in our application source. First, include “SEGGER_SYSVIEW.h” in your main application file, and then call SEGGER_SYSVIEW_Conf() in main() after all the hardware initialization is done.

We can now rebuild, flash, and run our application on the device. Select Build and then Rebuild aws_demos. Finally, select Target, and then Download aws_demos.

Running the Device Test

Now that our software is instrumented, we can start our tests on the device hardware. We connect it to the SystemView host application and record what the system is doing. To enable recording the whole system from start, we added a debug switch to the code that makes the application wait for SystemView to connect when the button is held down during reset.

This is how you prepare your hardware for analysis with SystemView:

Make sure the device is running the project application
Connect the device via J-Link to the host computer
Hold down the user button and press reset on the device; the green USER LED will toggle to indicate it is waiting for SystemView to connect
Open the SystemView host application
Select Target > Start Recording
Configure the target connection
- Check USB as Connection to J-Link
- Enter STM32L475VG as Target Device
- Select SWD, 4000 kHz as Target Interface and Speed
- Check Auto Detection as RTT Control Block Detection

When SystemView is connected to the device, we see the LED turn off and the system starts running. We want to test the device behaves as designed and publishes a status message to the MQTT Message Broker on each button press. We press the button multiple times while SystemView is recording it.

SystemView records and visualizes what happens on the device while it’s running. We can see the system tasks and interrupts, function calls, and log messages.

IoT Run-Time-2

Figure 2 – SystemView recording execution of target.

Analyzing the System

To further analyze the behavior and verify this matches our design, we stop recording and save the events to a file. This enables us to re-do the analysis at a later point and store the record file for our documentation.

To stop and save a recording:

Select Target > Stop Recording • Select File > Save Data
Select a directory and filename for the record file
Enter some information about the record

Now, we navigate to the first event of the record in the events list and start our analysis there.

IoT Run-Time-3

Figure 3 – Recorded system information and first events.

In Figure 3, you can see the first events we recorded on the target. We get some information about the system—the CPU frequency, timestamp frequency used for event timestamps, application name, target device, OS name, and system time. All of this has been generated by the system instrumentation, which saved us from creating a more complex project in the analyzer tool. This information is also displayed along with some more recording statistics, in the System Information Window.

IoT Run-Time-4

Figure 4 – Events and log messages of system initialization phase.

Next, as shown in the image above, we see the initialization phase of the system. Our key provisioning runs to write certificates and keys, and the system MQTT Task that handles communication with the broker is created. The Wi-Fi module is then initialized and connects to our network. We used some debug log messages in addition to the instrumentation.

These log messages are recorded by SystemView and shown in the host application with timestamp and context (i.e. the task) they have been printed from.

IoT Run-Time-5

Figure 5 – Events and timeline on task creation; information about system tasks and interrupts.

After initialization, our Main Task is created and starts running. In Figure 5, we can see when a task is created, becomes ready for execution, and starts execution in the timeline. More information about priority and stack are available for each stack, along with run-time statistics.

In the Main Task, we can see that it sends a command to the MQTT Task to connect to the Message Broker and waits to receive a status from it. When the MQTT Task connected to the broker, it sends the status and the Main Task returns from the receive function. It then waits for an event and is put into idle.

IoT Run-Time-6

Figure 6 – System idle time with tick interrupts; system wake-up by external interrupt.

When the system is idle, we only see the periodic SysTick interrupts that keep our system running (see the top of Figure 6). At some point, we also see another interrupt, ISR #56, happening (bottom of Figure 6). That must be the interrupt triggered by our button press.

In the button interrupt, the event is set and this makes the Main Task ready to run. When the interrupt handler is left, the Main Task starts running and sends a command to the MQTT Task to publish our new status to the Message Broker. After the Main Task received the status from the MQTT Task, it waits for the next event.

This shows that our system behaves as expected and our designs seems to be working.

Finding Issues in the System

We do some more tests with our device and find that sometimes button presses get lost and the connected lights do not turn on or off. This can be forced and reproduced when the button is pressed three times, once a second. It should turn a light on, off, and on again (or vice versa). Instead, the light turns on and off, but not on again. In the MQTT Console, we also see only two status messages published.

Issue Analysis

We connect SystemView to our device once again let it run. We record the system while forcing the described issue by pressing the button three times with one second delays. Then we stop recording and start our analysis.

First, we look for the button interrupts. In the Contexts Window, we see that it ran three times as expected. In the timeline, we note the one-second delay between interrupts.

Then we check what happens on a button interrupt. The timeline shows the first interrupt set the event that activated our Main Task, which then sent a publish command to the MQTT Task. The MQTT Task ran and published the message to the MQTT Message Broker. While the Main Task was deactivated, waiting for the status from the MQTT Task, the next two button interrupts happened. Both interrupts set the event for the Main Task as expected.

The MQTT Task published the message after about 1.2 seconds and then activated the Main Task. We run the test multiple times and publishing might take even longer sometimes, depending on Wi-Fi and network quality.

The Main Task ran and checked for the next task event, which had already been set by the button interrupt. This is where things went wrong.

Our device should publish one status message immediately for each button press, but publishing the message could block the Main Task longer than expected, especially when the network connection wasn’t stable. During that time, multiple button interrupts might happen. In that case, only the first button press set the event, and subsequent interrupts had no effect as the event has already been set.

When the Main Task ran again, it got only one event instead of multiple ones and published only one status message.

Such timing issues can be hard to find through the debugger or require debug specific changes, such as additional counters or messages that further affect the system. With software instrumentation and visualization of the run-time behavior, we can find and identify those issues easily.

Solution

Instead of setting a single bit in the event group, we need to change our design to count the number of events since last check.

We can replace the event with a counting semaphore that is given in the button interrupt and taken by the Main Task. That way, the Main Task could take the semaphore for each time the button has been pressed. For your design, you might also find more efficient ways of doing this, such as direct to task communication or message buffers.

The downside of that approach is that the number of waiting status messages to be published will queue up when the button keeps getting pressed. We can further improve our design to extend the status message with the number of status updates to handle all button presses with one published message.

Conclusion

An application-independent software instrumentation can reveal the run-time behavior of an IoT device. It requires almost no changes to the application source, which enables easy use of one instrumentation for multiple projects.

Software instrumentation provides a much deeper insight into a system than simple debug messages. It shows the communication and interaction between tasks and interrupts while adding little overhead to the system itself. Precise timestamps allow identification of overall system load and duration of specific parts. Inefficiencies and timing-related issues can be identified easily.

If you have problems in your projects that could not be found with standard debugging techniques, give software instrumented analysis a try with SystemView.

If you want to dive even deeper into your system to see each executed instruction without modification of your application, have a look at instruction tracing with J-Trace PRO.

SEGGER Microcontroller – APN Partner Spotlight

SEGGER is an APN Standard Technology Partner. They offer support throughout the development process with high quality, flexible, and easy-to-use tools and components. SEGGER provides solutions for secure communication as well as data and product security, meeting the needs of the rapidly evolving IoT market.

Contact SEGGER | Practice Overview

*Already worked with SEGGER? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.