Next Generation CPE Command and Control Architectures on AWS

Communication Service Providers (CSPs) manage millions of CPE (Customer Premises Equipment) like broadband routers, Wi-Fi gateways, and more. Traditionally, most of these devices have been managed by standards such as TR-069 (CPE WAN Management Protocol or CWMP) from Broadband Forum. However, although TR-069 has been effective for basic management and configuration, it presents drawbacks in feedback speed and scalability. Given these limitations and the rising demand for enhanced, real-time device management, many CSPs are migrating to the Broadband Forum’s User Services Platform (USP) or TR-369. USP, being a more modern standard, offers advantages such as faster feedback loops, increased scalability to accommodate the surging number of connected devices, improved security mechanisms, and the flexibility to support next-generation services and applications via its multi-controller support. As the CPE ecosystem expands and customer demands evolve, adopting standards like USP becomes essential for CSPs to make sure of optimal service delivery and customer experience.

CSPs have traditionally incorporated TR-069 agents on their CPEs, and the migration path to the newer TR-369 (USP) isn’t uniform across the board. Although certain providers are keen on making a full transition to the TR-369 stack, others perceive value in maintaining their TR-069 agents on CPEs, especially for specific command and control use cases. Notably, these use cases often pertain to tasks like firmware upgrades and speed tests. One primary reason for this dual approach is the robustness and proven reliability of TR-069 for these particular functions. Another significant consideration is the deep-rooted integration of TR-069 ACS (Auto configuration server) with existing Operational Support Systems (OSS) and Business Support Systems (BSS), which can make a migration time-consuming and costly. By retaining specific TR-069 functionalities, CSPs can make sure of a smoother transitional period and capitalize on tried-and-tested methodologies while embracing the capabilities of TR-369. Although these standards lay the foundation for CPE management, the advent of cloud technologies also offers a fresh perspective on it.

In this post, I dive deep into the details of multiple types of command-and-control architectures, focusing on harnessing the power of AWS IoT Core over MQTT to effectively manage CPE devices. I explore various configurations, ranging from CPEs where TR-069 and TR-369 agents harmoniously coexist, to setups featuring full-fledged USP CPEs. Additionally, I touch upon advanced architectures leveraging edge processing via AWS IoT Greengrass. This simultaneously positions controllers on the edge device and in the AWS Region, showcasing the dynamism and flexibility inherent in contemporary device management.

Architectures

In this section, I list some possible CPE command and control architectures and discuss their design principles and how they interact with the mentioned tools and standards.

Architecture 1: Custom Agent on the CPE

Figure 1 – Custom Agent on the CPE

In this architecture, shown in the preceding figure, applications residing in the AWS Cloud utilize the AWS IoT SDK to communicate with the AWS IoT Device Shadow service. A shadow serves as a virtual representation or “shadow” of a device, capturing both its desired and reported states. Applications can modify the desired state in this shadow, while the actual device (or its agent) synchronizes with this state, updating the shadow’s reported state accordingly. This framework not only facilitates a centralized command and monitoring mechanism for dispersed CPEs, but also alleviates the need for applications to await CPE responses concerning their current state. Especially when dealing with slower CPEs, waiting for these responses can detrimentally affect the application’s user experience. Engaging with the shadows offers a swift alternative. Even though the consistency achieved is eventually reached, it’s advantageous for scenarios where a degree of data staleness is acceptable. This arrangement thereby makes sure of more responsive interactions, benefiting a multitude of use cases.

The intricate nature of CPE data models necessitates a segmented representation. Therefore, the entirety of the device’s details isn’t captured in a single shadow document. Instead, data is divided into different named shadows, such as IP, Ethernet, and Wi-Fi. These shadows draw parallels with specific modules or sections within the TR-181 framework, such as the “WiFi Data Model” or “Ethernet Data Model.” Each of these represents a specialized functional area, making sure of a standardized approach that aligns with the Broadband Forum’s guidelines.

CPE metadata management

Beyond shadows, each CPE device is abstracted as a “thing” object within the AWS IoT Core. This object is intricately linked to the device’s X.509 certificate, its corresponding shadows and stores metadata regarding the CPE in the form of thing type and attributes. The preceding figure shows an example configuration where the CPE serial number was chosen as the thing name.

On-device architecture and data flow

Within the CPE device, a custom agent works in tandem with the traditional CWMP agent. This agent, by leveraging the AWS IoT Device SDKs and the certificates provided by the AWS IoT Core for its associated “thing”, acts as a bridge between the device and the cloud. Its primary role is to make sure that the device aligns with the desired states specified in its shadows.

Behind the scenes, the agent subscribes to a reserved shadow topic to stay informed of delta events. The AWS IoT Device Shadow service continuously monitors discrepancies between the desired and reported states of the shadow (not the other way around). Upon detecting a difference, the service calculates a delta and broadcasts it to a reserved topic to which device agents are subscribed. The device, upon receiving this delta, integrates the changes either through direct low-level communication or by utilizing middlewares like RDK-B. Subsequent to this configuration update, the device communicates its reported state to another reserved topic using MQTT.

It’s crucial to emphasize that while the AWS Device SDK assists in managing connections and offers methods for shadow interactions, the custom agent must make sure that the shadow document conforms to the structure mandated by AWS IoT.

Architecture 2: USP Agent on the CPE

Figure 2 – USP Agent on the CPE

Building on the foundation established by the “Custom Agent on the CPE” architecture detailed previously, this architecture introduces a distinct on-device approach while retaining the interaction of applications with AWS IoT Shadows.

This model employs a native USP agent on the CPE device. Unlike the prior setup, this arrangement does not have any direct AWS component on the device, except for the device certificate, if it was generated by AWS IoT Core. Although it may not be evident in the architectural diagram shown in the preceding figure, it’s crucial to note that this USP agent can still coexist with the legacy TR-069 agent. There are two primary ways for the USP agent’s development: initiating a development from scratch, adhering to the USP standard, or leveraging the Broadband Forum’s Open Broadband-User Services Platform-Agent (OB-USP-Agent) as a foundational layer, subsequently building additional features on top of it.

On-device architecture and data flow

Similar to the first architecture, applications within the AWS Cloud interact with the AWS IoT Device Shadow service. This maintains both the desired and reported states of a device.

Upon a change or update in the shadow, AWS IoT calculates the delta, which essentially delineates the discrepancies between the desired and reported states. This computed delta gets dispatched to a reserved delta topic. An AWS IoT rule, continuously monitoring this topic, intercepts the delta information and invokes a designated AWS Lambda function, supplying it with the shadow delta data.

Then, the Lambda function’s responsibility is to extracting the payload, converting it into a USP Set message, and subsequently encoding it using the USP’s Protocol Buffers (Protobuf) schema. Once encoded to Protobuf, the Lambda function disseminates the USP payload to a command topic to which the CPE subscribes.

Upon receipt of the command, the CPE carries out the configuration changes using the previously discussed mechanisms and responds with a SetResp message to a designated response topic. This SetResp message is promptly intercepted by another AWS IoT rule that channels it to an associated Lambda function.

In this architecture, a new feature of the AWS IoT Core’s Rules Engine is also used. Instead of imposing the Protobuf decoding task on the Lambda function, this feature authorizes the rules engine to undertake the decoding of the USP Protobuf. Consequently, the Lambda function only receives a JSON payload encapsulating the SetResp message. This payload is more straightforward to process and offers better compatibility. The Lambda’s primary role transitions to transmuting this JSON message into the required Shadow document format and subsequently updating the associated shadow object. This refashioned procedure augments efficiency, paving the way for a more agile device management system.

Architecture 3: Real-time RPC (Remote Procedure Call)

For operations demanding real-time responses – such as when operational personnel diagnose connectivity issues, when an urgent security patch requires immediate application, or for other specific application use cases – the eventual consistency offered by shadow mechanisms may fall short. In these situations, the Real-time RPC Architecture steps in, providing a more immediate and direct way of interaction with CPE devices.

In the Real-time RPC Architecture, AWS IoT Core functions as the central broker. The AWS IoT Core’s Rule Engine transforms the incoming Protobuf messages using its advanced feature mentioned earlier, converting them to USB’s JSON format before publishing them to the application topics. This makes sure that the operations team, or any application interacting with the CPE, receives data in a familiar and easily interpretable format.

On the other side, when an instruction must be sent to the CPE device, the applications merely have to formulate the Get and Set USP payload. Then, a designated Lambda function takes on the task of converting this to the Protobuf format suitable for the device. The applications stay tuned for the GetResp and SetResp responses, making sure of a seamless two-way communication.

In enhancing this direct architecture, additional functionalities, such as ‘allowed operations’ checks, can be integrated. Before relaying the command to the device, a verification can be carried out on the cloud side to ascertain if the controlling application possesses the necessary permissions for that specific action. Additional criteria, like time-of-day restrictions or application user ID verifications, can be enforced without burdening the device, making sure of robust and secure operations.

Architecture 4: USP Controllers at the Edge and AWS Region

Until now, our discussions have centered on regional architectures, with controllers (and associated applications) located within the AWS Region. Although this model effectively addresses most use-cases and can be cost-efficient, certain scenarios necessitate edge processing. Here are illustrative situations that highlight the need for edge processing:

1. Disconnected Device Scenarios: In the event of internet connectivity loss, operations reliant on a controller’s business logic in the AWS Region come to a standstill. This poses challenges for critical use-cases like home security or emergency medical alerts, where instantaneous communication is paramount.

2. High Data Volume and Low Latency Scenarios: Consider use-cases like Wi-Fi optimization. To optimize a local network, vast amounts of telemetry data are gathered at short intervals, necessitating swift network parameter adjustments to maintain optimal user experience. Transmitting commands via the internet might not be viable due to latency-sensitive command and control scenarios, and amassing granular telemetry data at the Region might be financially impractical.

3. Low-Speed Device Scenarios: In scenarios where bandwidth is constrained at the CPE side, edge processing can alleviate bandwidth utilization resulting from non-user traffic.

To address these challenges, AWS IoT Greengrass can be strategically deployed on edge devices.

AWS IoT Greengrass is software that extends cloud capabilities to local devices. This enables devices to collect and analyze data closer to the source of information, react autonomously to local events, and communicate securely with each other on local networks.

Within this architectural design, AWS IoT Greengrass is installed on the CPE device, with its MQTT Broker and MQTT Bridge components. The edge MQTT broker facilitates both device-to-device and device-to-edge application communications. This means edge controllers can communicate with any device linked to the MQTT broker. Importantly, the USP Agent is among the devices connected to this local broker, enabling the creation of streamlined applications at the edge. These applications can then bypass intricate, low-level device communications or middleware engagements, like those with RDK-B, and instead communicate via the USP agent over MQTT.

Leveraging the AWS IoT Greengrass platform, controllers, whether designed as containers or Lambda functions, can be deployed directly at the edge. Moreover, their entire lifecycle management can be orchestrated from the Region.

With the local MQTT communication in place, these edge controllers have the option to locally collect telemetry data from the USP agent through the USP standard defined “Bulk Data Collection” mechanism. This negates the need to transfer vast amounts of data off-device. Then, this results in both rapid processing and cost-effectiveness.

Thanks to the broker’s connection with the AWS Cloud via the MQTT Bridge component, regional controller applications retain the ability to communicate with edge controllers (or directly with the USP agent) via MQTT. This duality provides application developers with a wealth of flexibility, enabling them to place their applications based on specific use-cases, as shown in the following figure.

Figure 3- USP Controllers at the Edge and AWS Region

Although this architecture doesn’t inherently rely on shadow components, note that it can be seamlessly extended to incorporate shadows functionality, should specific use cases require it.

Note that AWS IoT Greengrass platform brings additional system requirements that can be found in the AWS documentation. You should check the available system resources on the CPE before implementing a fleet-wide implementation.

Conclusion

Throughout this post, I showed various CPE command and control architectures that can be seamlessly implemented on AWS. Additionally, I explored how TR-069 and TR-369 architectures can coexist in a hybrid setting, accommodating diverse agent deployment options.

AWS enables you to create your own command and control architectures tailored to your CPE fleet’s unique requirements. Additionally, the flexibility offered by AWS IoT Core allows you to design and deploy tailored architectures that might not have been explicitly discussed in this post. Beyond this, AWS streamlines the process of constructing robust telemetry data collection architectures. For a deep dive into how you can transform TR-069 bulk data into actionable insights using AWS, you can read the post: “Turning TR-069 Bulk Data into Insights with AWS IoT Core and Analytics Services on AWS”.

You can also visit AWS for Telecom to learn about how CSPs reinvent communication with AWS.