AWS Architecture Blog

Ingesting PI Historian data to AWS Cloud using AWS IoT Greengrass and PI Web Services

In process manufacturing, it’s important to fetch real-time data from data historians to support decisions-based analytics. Most manufacturing use cases require real-time data for early identification and mitigation of manufacturing issues. A limited set of commercial off-the-shelf (COTS) tools integrate with OSIsoft’s PI Historian for real-time data. However, each integration requires months of development effort, can lack full data integrity, and often doesn’t address data loss issues. In addition, these tools may not provide native connectivity to the Amazon Web Services (AWS) Cloud. Leveraging legacy COTS applications can limit your agility, both in initial setup and ongoing updates. This can impact time to value (TTV) for critical analytics.

In this blog post, we’ll illustrate how you can integrate your on-premises PI Historian with AWS services for your real-time manufacturing use cases. We will highlight the key connector features and a common deployment architecture for your multiple manufacturing use cases.

Scope of OSIsoft PI data historian use

OSIsoft’s PI System is a plant process historian. It collects machine data from various sensors and operational technology (OT) systems during the manufacturing process. PI Historian is the most widely used data historian in process industries such as Healthcare & Life Sciences (HCLS), Chemicals, and Food & Beverage. Large HCLS companies use the PI system extensively in their manufacturing plants.

The PI System usually contains years of historical data ranging from terabytes to petabytes. The data from the PI system can be used in preventive maintenance, bioreactor yield improvement, golden batch analysis, and other machine learning (ML) use cases. It can be a powerful tool when paired with AWS compute, storage, and AI/ML services.

Analyzing real-time and historical data can garner many business benefits. For example, your batch yield could improve by optimizing inputs or you could reduce downtime by proactive intervention and maintenance. You could improve overall equipment effectiveness (OEE) by improving productivity and reducing waste. This could give you the ability to conduct key analysis and deliver products to your end customers in a timely manner.

PI integration options

The data from the PI System can be ingested to AWS services in a variety of ways:

PI Connector for AWS IoT Greengrass

The PI Connector was developed by the AWS ProServe team as an extended AWS IoT Greengrass connector. The connector collects real-time and historical data from the PI system using PI Web Services. It publishes the data to various AWS services such as local ML models running at the edge, AWS IoT Core / AWS IoT SiteWise, and Amazon S3.

Connector requirements and design considerations

Specific requirements and design considerations were gathered in collaboration with various customers. These are essential for the most effective integration:

  • The connector should support reliable connectivity to the PI system for fetching real-time and historical data from the PI.
  • The connector should support subscription to various PI data modes like real-time, compressed/recorded, and interpolated, to support various use cases.
  • The initial setup and incremental updates to the PI tag configuration should be seamless without requiring any additional development effort.
  • The connector should support data contextualization in terms of asset/equipment hierarchy and process batch runs.
  • The connector should ensure full data integrity, reliable real-time data access, and support re-usability.
  • The connector should have support for handling data loss prevention scenarios for connectivity loss and/or maintenance/configuration updates.
  • The setup, deployment, and incremental updates should be fully automated.

Deployment architecture for PI Connector

The connector has been developed as part of AWS IoT Greengrass Connectivity Framework and can be deployed remotely on an edge machine. This can be running on-premises or in the AWS Cloud with access to the on-premises PI system. This machine can be run on a virtual machine (VM), a physical server, or a smaller device like a Raspberry Pi.

The connector incorporates a configuration file. You can specify connector functions such as authentication type, data access modes (polling or subscription), batch contextualization and validation on the data, or historical data access timeframe. It integrates with the PI Web APIs for subscription to real-time data for defined PI tags using secure WebSockets (wss). It can also invoke WebAPI calls for polling data with configured interval time.

The connector can be deployed as an AWS IoT Greengrass V1 AWS Lambda function or a Greengrass V2 component.

Figure 1. PI Connector architecture

Figure 1. PI Connector architecture

Connector features and benefits

  • The connector supports subscription to real-time and recorded data to track tag value changes in streaming mode. This is useful in situations where process parameter changes must be closely monitored for decision support, actions, and notifications. The connector supports data subscription for individual PI event tags, PI Asset Framework (AF), and PI Event Frames (EF).
  • The connector supports fetching recorded/compressed or interpolated data based on recorded timestamps or defined intervals, to sample all tags associated with an asset at those intervals.
  • The connector helps define asset hierarchy and batch tags as part of configuration, and contextualizes all asset data with hierarchy and batch context at the event level. This offloads heavy data post-processing for real-time use cases.
  • The connector initiates event processing at the edge and provides configurable options to push data to the Cloud. This occurs only when a valid batch is running and/or when a reported tag data quality attribute is good.
  • The connector ensures availability and data integrity by doing graceful reconnects in case of session closures from the PI side. It fetches, contextualizes, and pushes any missed data due to disconnections, maintenance, or update scenarios.
  • The connector accelerates the TTV for business by providing a reusable no-code, configuration-only PI integration capability.


The PI Connector developed by AWS Proserve makes your real-time, data ingestion from PI historian into AWS services fast, secure, scalable, and reliable. The connector can be configured and deployed into your edge network quickly.

With this connector, you can ingest data into many AWS services such as Amazon S3, AWS IoT Core, AWS IoT SiteWise, Amazon Timestream, and more. Try the PI Connector for your manufacturing use cases, and realize the full potential of OSI PI Historian data.

Further reading:

Piyush Batwal

Piyush Batwal

Piyush works as an industry specialist in Healthcare & Life Sciences in AWS Proserve. He has over 20 years of business and technology experience in Smart Factories, Manufacturing Execution Systems (MES), Shop-floor Automation Systems and Industrial IoT / Industrie 4.0. Piyush works with large AWS customers towards adoption of AWS technologies for their global manufacturing footprints. He is a trusted advisor helping them define how AWS technologies can transform their manufacturing IT/OT landscape, and improve manufacturing operations. Piyush holds a B.S. in Mechanical Engineering.

Nitin Kaul

Nitin Kaul

Nitin works as Associate Director of IT Architecture at Merck & Co. Inc., and is focused on strategy, architecture, and execution aspects of various technology innovation initiatives. Nitin's areas of focus includes the Internet of Things - Architecture & Frameworks, Scalable Distributed/Micro-Services Architecture, Scale-out computing frameworks, Systems/Reliability Engineering, DevOps, and Data Science & Big Data Analytics. Nitin holds a B.S. in Electronics & System Engineering and an M.S. in Computer Science.