AWS for Industries

Achieving robust closed-loop control in remote locations with Kelvin’s edge-cloud communication

In today’s digital landscape, optimizing the performance of distributed assets in remote locations poses unique challenges. Achieving closed-loop control, where real-time monitoring and adjustments are made based on feedback, becomes particularly difficult when reliable and consistent connectivity is not guaranteed. However, with the advent of distributed edge computing, companies like Kelvin are revolutionizing the way we approach closed-loop control in remote areas. In this blog post, we will delve into Kelvin’s innovative edge-cloud communication mechanism and explore how it enables robust closed-loop control of distributed, networked assets in remote locations.

Kelvin, a leading next-gen industrial automation software company, provides artificial intelligence (AI)–powered asset performance optimization software that focuses on the industries of energy (for example, well construction and completions), upstream oil and gas production, midstream oil and gas operations, process manufacturing (for example, chemicals, food and beverages, and pulp and paper), mining and metals, and renewable energy. Multiple global enterprises that operate thousands of assets (e.g. BP, Halliburton and Santos) have used Kelvin solutions built on Amazon Web Services (AWS) to connect, create, and scale advanced closed-loop supervisory-control applications across their operations without needing to rip and replace any of their existing infrastructure.

Closed-loop control, also known as feedback control, is a fundamental technique employed in system control engineering. This mechanism involves continuously monitoring a system’s output, comparing it to a desired reference value, and using the feedback obtained to make necessary adjustments or corrections to the system. The objective is to bring the system’s output closer to the desired value, promoting better accuracy and performance. In the era of machine learning (ML) and AI, control mechanisms have become increasingly intricate. As a result, the scope of control targets has expanded beyond simple monitoring and dynamic control of single control variables. Modern algorithms now aim to achieve higher-level objectives, such as minimizing energy consumption while maximizing production for a given system.

While traditional closed-loop control systems rely on programmable logic controllers (PLCs) as their main infrastructure for local data processing, AI-based closed-loop control systems require a new form of infrastructure, known as edge computing, which can access and process a broader set of global data. Edge computing enables higher-level software to run, helping engineers to create and deploy applications that incorporate AI/ML, model-based control, advanced mathematics, system-level optimization, and the inclusion of disparate data sources into automated decision-making.

Reasons for edge computing

To optimize distributed assets, particularly in closed-loop control scenarios, a distributed edge computing system is essential. The following are the primary reasons for implementing a distributed edge:

  • Security: Securely sharing asset data and controls over the internet has been a challenge in most operational technology (OT) networks. OT hasn’t traditionally been networked technology—meaning it hasn’t been connected to a larger network over the internet. Controllable devices have generally used closed proprietary protocols and relied on air gapping for security. However, air gaps can’t provide adequate security for network communication and OT data. Deploying edge computing into these networks enables a secure gateway to collect data, communicate with a centralized system, and interact directly with physical assets.
  • Autonomy: Adding edge computing to OT devices enables real-time data processing closer to the source. Rather than sending data over a network to a centralized location—for example, Kelvin Cloud on AWS—Kelvin enables comprehensive processing on the edge that can perform real-time analysis of broad-scale process data, providing virtually immediate insights to optimize the asset. This approach enhances system resilience and facilitates autonomous asset operation.
  • Latency: In many cases, asset control requires gaining immediate insights from real-time feedback. A round trip to a central cloud environment, over multiple routing hops, and into centralized processing systems introduces undesired delays between gaining insights and taking action. By keeping this control loop at the edge, these delays can be minimized. While data is still ingested into the cloud for systematic monitoring and comprehensive big data analysis, the latency requirements are less stringent compared to the closed loop at the edge.

These advantages can be attained through the use of conventional edge technologies such as PLCs or remote terminal units (RTUs), which do not require connectivity. However, it is only with a connected edge that local processing, centralized insights, and centralized application management can be enabled simultaneously. Therefore, it is essential to establish dependable communication between the edge and a centralized system, accounting for potential connection instabilities.

Communication challenges with edge

The architecture of a distributed edge is fundamentally different from a traditional web-based application, where most compute resources are centralized. A large amount of IT efforts over the past decades has been focused on making centralized systems in data centers or clouds fault tolerant. Most mechanisms, like cluster computing, load balancing, or resource replication, work well in a centralized environment but are not always directly applicable to a distributed edge. Organizations mitigate this challenge by selecting reliable edge hardware in the form of industrial computers. Optimization workloads are designed and deployed to these devices on the basis that a lower level of safety or process-critical control is still running on conventional PLCs.

In addition, a distributed edge system may encounter intermittent communication between the edge and the cloud due to the nature of remote control. For instance, in remote oil fields or on equipment vans, it is hard to maintain consistent connectivity. Outages are frequent and can be caused by something as trivial as a truck blocking a satellite dish in service. To provide reliable and robust control, Kelvin’s solutions implement a closed-loop control mechanism that accounts for temporary network outages and intermittent data transmission.

Kelvin’s solution

Below is the architectural design for tackling the communication challenge.

Kelvin AI on AWS architectural diagramFigure 1. Kelvin AI on AWS allows engineers to build and deploy applications that automatically optimize industrial production systems.

The Kelvin solution is a single-tenant software-as-a-service (SaaS) product deployed into a dedicated virtual private cloud (VPC). The solution consists of two major parts: Kelvin Cloud and Kelvin Nodes. Kelvin Cloud, implemented on AWS, is the mainframe system that not only provides centralized management of the assets in their digital form but also generates essential insights to optimize asset performance through advanced data-centric analytics. Kelvin Manager is the interface for customers to manage the Kelvin platform, including any Kelvin Clusters, either on-premises or in the cloud. The user does not directly interact with AWS services in the Kelvin account. Kelvin Nodes are edge runtimes that can run on a Linux server or gateway and turn it into an edge device. The user can achieve high availability by adding multiple nodes. AI algorithms embedded in containerized applications that optimize assets can be centrally computed in Kelvin Cloud, and the results are subsequently distributed back to, and run locally on, edge devices. Kelvin Nodes are deployed in the customer’s cloud or edge environments. Custom applications running on these Kelvin Nodes can consume and act upon telemetry data streaming from local and remote sources, including data originating from industrial control systems and IT systems.

This architecture enables three levels of edge communication:

  • On-edge communication: Every Kelvin Node comes with a local broker that allows all components on the node to communicate with each other. A service called Kelvin Bridge enables bidirectional communication with assets through various communication protocols such as OPC Unified Architecture (OPC UA). The broker makes data collected by Kelvin Bridge available for all services to consume. For instance, an edge application can use collected data to feed an AI model, draw the inference, and return the results as optimization recommendations or as control changes. When making control changes, the Kelvin Control Manager verifies that the control changes are made in a secure, authenticated, and reliable manner and keeps a comprehensive ledger that records each control change, the responsible entity, and the precise timing. The on-edge communication operates independent of internet connectivity, functioning autonomously with minimal latency.
  • Edge-to-cloud communication: A significant advantage of a connected edge is the ability to achieve centralized insights and systematic manageability, which Kelvin has realized with its edge sync system. On each Kelvin Node, a dedicated service collects and buffers all communication from the edge and syncs it opportunistically in batches with Kelvin Cloud. This approach verifies that Kelvin Cloud consistently receives the most up-to-date data without clogging up the uplink with too many individual messages. The data in the cloud is available for other applications and services to consume while also being archived for future use. For example, the data can be used for business reporting, financial accounting, ESG reporting, and more. At the edge, the data is buffered, providing resilience in the event of an internet outage. The buffer retains all communication, verifying that edge operations remain unaffected during such periods. Once connectivity is restored, the system catches up on all communications and synchronizes the data, achieving solution uptime and preventing data loss. The buffer size depends on the available edge storage; however, the default retention policy is 1 month.
  • Edge-to-edge communication: In certain scenarios, it is important for an edge application to have awareness of information from other edge locations. For instance, a system of assets could all feed into the same supersystem or draw resources from the same subsystem. When all assets experience similar conditions, such as a supply shortage in their subsystem, this information can be used to make adjustments, such as a controlled reduction of production, across the assets. Kelvin addresses this requirement by enabling edge-to-edge information sharing, allowing Kelvin Nodes to subscribe to the information of others. This direct edge-to-edge communication facilitates timely updates among the assets, removing the need for an edge node to poll the cloud for that information.

Customer impact

The implemented solution has recently been deployed to optimize a fleet of over 250 unconventional gas wells for a customer. This customer worked with Kelvin to improve field-wide surveillance and asset optimization. Over the course of 6 months, the deployment operated seamlessly with minimal data loss that had negligible impact on operations. As a result, the closed-loop control, powered by AI in the cloud and run on the edge, achieved a remarkable 10-times return on investment (ROI), primarily from production gains. Encouraged by this success, the customer is now in the process of deploying the closed-loop control application on Kelvin to an additional 1,000 wells. Beyond this expansion, the customer is further connecting its entire onshore asset base of more than 4,000 wells to Kelvin for deploying additional AI-powered closed-loop automation applications to achieve autonomous operations.

Enabling autonomous control in remote areas presents significant challenges, such as unreliable connections between the remote units and the central framework, which are susceptible to interruptions caused by factors like weather conditions, human errors, and unforeseen disruptions. Designing and implementing a reliable system that addresses this unpredictability has been a major opportunity for innovation. Kelvin has successfully tackled this issue in its closed-loop supervisory control applications through an architecture that employs data buffering, intelligent data transmission that uses available bandwidth, and the ability to pause and resume data transmission as needed. The implemented solution has been deployed to optimize distributed assets and has proven to be robust, reliable, and highly successful. Kelvin helps customers to build powerful asset optimization and control systems that employ advanced analytics and AI capabilities, deployable from the cloud to the edge.

To learn more about how Kelvin can improve your own industrial operations, please reach out through email to or visit the Kelvin website.

Tim Le Souef

Tim Le Souef

Tim Le Souef is a Specialist Solutions Architect for AWS Energy. Tim has a background in advanced control system engineering and has worked in subsea oil and gas, aerospace and automotive technology roles over his 22-year engineering career. He works with AWS’s global energy customers to advance their industrial control system functionality and cybersecurity, leveraging AWS-native and Partner technology.

Tim Crommelin

Tim Crommelin

Tim Crommelin is the Vice President of Asia Pacific for, focused on building the company’s regional presence through strategic business development, successful collaborations and delivering transformative results. Tim has served various roles in his 25-year career in industrial technology, including management consultant, product manager, entrepreneur, solutions management and sales leadership, delivering innovative software solutions for the resources industry."