AWS Startups Blog

IoT Primer: Reliable Commands

By Brett Francis, Principal Solutions Architect, AWS


Simplicity is prerequisite for reliability. — Edsger W. Dijkstra

Welcome to the fourth post in the series of primers designed to help you build great solutions while bringing together a collection of “small things” within the Internet of Things (IoT) domain.

In the first post of this series, I introduced the four layers of the Pragma Architecture: Small Things, Intermittent Layer, Speed Layer, and Serving Layer. In the second and third posts, I began to dig deep into the Speed Layer. In this post, I finish my discussion of the Speed Layer and begin to explore the Serving Layer, with a focus on the challenge of sending commands to devices that operate in unreliable environments.

Commands — The Ability to Ask a Device to Do Something

The challenges that customers typically run into when they create a command solution are often predicated by how they answer the following questions:

  • Should commands be delivered reliably or can they be delivered with a best effort?
  • Does the solution need bi-directional commands between a small thing and some other paired entity?
  • Because we are asking a device to do something that potentially could be powerful or even dangerous, how can we deliver and execute commands in a secure or even human-authorized way?

Let’s address these challenges by laying a simple philosophical foundation for reliable commands.

Reliable Commands, Unreliable Environments

First, let’s revisit one of the defining characteristics of a small thing: Just like a glove takes on the physical shape of the hand that wears it, a small thing takes on the logical shape of its core purpose and the resource constraints with which it is confronted.

Beyond the small thing itself, environmentally driven resource constraints are also a significant design consideration when you introduce commands into a solution. To deliver the best customer experience for your users, your command execution must be as reliable as you can afford even in the face of flaky networks and unreliable power or compute resources. If your answer to the question, “Should commands be delivered reliably?” is a resounding “Yes,” then you have the best starting posture for your users as well as your DevOps team.

You might have your attention drawn to transport and protocol choices or discussions of Wi-Fi vs. mesh vs. near-field communication vs. mobile piggybacking. But if you take a broader perspective, you’ll find you can add a large amount of reliability to a command solution by embracing a simple concept: Nothing is successful unless it is acknowledged as successful.

Acknowledgement is Power

Let’s add the capability to send a “power-on” command within the regional, multiprotocol, stream-processing solution that we laid out in the previous post about Telemetry.

Imagine that we send this “power-on” command to a device that may be a home automation plug, a heater, or even a device with greater impact like a solar power inverter on a rooftop. In all of these cases, we won’t know if power has been turned on unless we receive an acknowledgement from the device. Then, without reliable information about the state of the device, we could have a dangerous situation.

At this point, I want to introduce a new and fundamental component into the architecture diagram from the earlier post about telemetry. This fundamental component is a command database that can start small but grow to consistently handle the heat of hundreds of thousands, if not millions of simultaneous writes or reads that happen when commanding a fleet of small things. This is where a NoSQL database (such as DynamoDB) truly shines.

DynamoDB architecture with HTTPS

On the left of the preceding diagram the HTTPS small things — such as solar power inverters that are sending telemetry to our regional stream — also check for their specific commands directly from the command database. When each small thing successfully completes a given command it writes an acknowledgment directly back to the command database. This places the retry logic in or near the small thing itself, which usually makes sense because the small thing is in the best position to know something about its current operating environment.

The M2M Gateway also shown in the diagram has been extended to act as a bridge between the M2M small thing protocol and the command database. Every command that is in the command database is placed onto a device-specific M2M command topic. When the M2M small thing, which might be a connected relay near the solar inverter, receives and successfully completes the given task, it writes an acknowledgement back to the device’s specific M2M command topic. The M2M Gateway in the middle of the diagram then writes the acknowledgment directly to the command database. Many M2M protocols (such as MQTT) have already incorporated retry logic for the delivery of messages between the client of a topic and the topic itself.

By adding a command database into the architecture, we leverage our past investments in telemetry, add a single point for control for all commands, and gain visibility into those commands. In addition, the Pragma Architecture’s Serving Layer can reliably use the single point of command interaction to offer up a user interface that conveys the last commanded state of the devices under control of the solution. Most important, when your users ask a device to “power-on,” they can be more confident that the request was successful because the device itself will acknowledge the success.

But what if we want to offer our users a much more interactive, even bi-directional experience, while retaining the regional visibility and control we just introduced? Now the simple approach we’ve adopted really starts to show its strength and extensibility.

Small Things and Mobile Devices

Users typically expect that they can control a consumer-oriented small thing by a smartphone application. To achieve the level of interactivity expected in such a mobile application from anywhere on the planet, we need reliable bi-directional interaction between the small thing and the mobile phone.

The following diagram adds a device registry to the Pragma Architecture Serving Layer. A device registry is necessary to accomplish pairing, regardless of the use of the HTTPS or M2M protocol. A device registry should know about all the devices in your solution. It should know the mechanism for sending commands to the devices, and it should know with which gateway each M2M device is currently communicating. Also a device registry knows about your user’s mobile devices, mobile device to stream metadata, and the small things with which the user is allowed to communicate.

Pragma Architecture Serving Layer with device registry

As shown in the preceding diagram, the mobile device first interacts with the device registry to determine the list of small things with which the mobile user is allowed to interact. Once that list is determined, the mobile device is given focused privileges (possibly using the fine-grained access controls of DynamoDB with Amazon Cognito) to write commands directly to the command database for the allowed list of small things. Additionally, the mobile device can retrieve acknowledgements directly from the command database and display those to the mobile user.

Now we have a solution that can support commands from mobile devices. If we select a durable and scalable NoSQL database, the solution can also deliver pairing experiences for mobile users while adding an audit log into the solution. Of course, if auditability is not necessary or maybe undesirable, the commands database can simply be flushed of all records older than some specified period, such as 24 hours or 7 days.

Now let’s say that we are not happy simply propagating and tracking all commands sent to every small thing in our fleet. In that case, we might want to batch together certain commands. Or maybe we want human approval for some of the commands or for some devices. This means we need to introduce logic between the source of the command and the actual command database. In the most complex case we want the option to send a command to a workflow before it arrives in the device command database. This means we need to disconnect the mobile device from direct interaction with the command database. Now we need our first real API.

Sending Approved Commands

Diving again into the Serving Layer, let’s introduce some changes. As shown in the following diagram, we offer up an API to our mobile device that fulfills the needs of interacting with the device registry, and also offers up the ability for the mobile device to send commands to a specific small thing. By disconnecting the mobile device from the command database, we can now choose certain commands that should be processed differently from others.

Pragma Architecture Serving Layer with device registry and API

In the preceding diagram, the mobile device leverages an API to interact with the device registry to determine the list of small things with which the mobile user is allowed to interact. Next subsequent commands are sent through the API. Those commands that should be processed with a workflow (possibly using the Amazon Simple Workflow Service) are queued and sent to a human who is the authorizing party for that specific command, a particular device, or a fleet of devices. Once the command is approved, the workflow places the command into the command database; the small thing executes the command and acknowledges it as before. The device API then retrieves and makes the acknowledgment available for the mobile device to display to the user.

Looking Ahead: Managing DevOps for Small Things

In this post, I explained how to add a centralized NoSQL command database to the Speed Layer. I also discussed the practice that every entity that processes a command, whether it is a small thing or mobile device, should acknowledge the command it processed. With those two simple choices, you can add command capabilities to your solution that do not directly force a choice of the communication protocol used by any particular small thing.

By embracing the simple notion nothing is successful unless it is acknowledged as successful as a core tenet of the solution, we have met the prerequisite for reliability.

Now with the ability to receive telemetry from our fleet of well-behaved small things and the ability to reliably ask our fleet to do something regardless of protocol, we can start to explore some of the more complex scenarios of managing a fleet of small things. In the next post, I will explore how to meet the challenges of device DevOps.

May your success be swift and your solution scalable.