AWS for Industries

Digitize downstream operations with AWS energy solutions: how CEPSA’s data democratization journey inspired Refinery Monitoring and Surveillance


The oil and gas industry is divided into three main sectors: upstream, midstream, and downstream. The downstream business sector includes refining and petrochemical facilities that process petroleum crude oil and raw natural gas into finished products, plus the marketing and distribution of these products. These industries include a complex network of processes, infrastructure, and equipment (for example, pumps, reactors, towers, tanks, compressors, furnaces, and heat exchangers) to manufacture hydrocarbons into finished products for market (for example, motor gasoline, kerosene, diesel oil, fuel oil, lubricants, asphalt, chemicals feedstock, and natural gas).

Volatile and fluctuating market prices create a margin-sensitive business for the downstream industry, directly impacting company cost environments and modes of operation. As a result, downstream businesses prioritize efforts on reducing operating costs, improving feed flexibility, reducing unplanned equipment downtimes, and increasing overall efficiencies to maintain market competitiveness. Because of the scale and complexity of refining and petrochemical operations, data is largely dispersed across legacy documents and systems (for example, Historian instances, distributed control systems, enterprise resource planning (ERP) systems, field images, field diagrams, and work orders). As a result, real-time access to relevant business data is challenging and inefficient.

Nowadays, cloud technologies applied to the energy industry provide the capability to collect, store, analyze, and learn from large volumes of data at scale and to have a near real-time and forward-looking view on different high-value activities across the industry portfolio. In an oil refining and gas industry survey published by Accenture, 80 percent of refiners reported that digital technologies were adding up to $50 million in value to their businesses, with potential for more in the future. Areas of most impact include maintenance and reliability, production planning and scheduling, and production execution.

Cepsa’s production data lake

Cepsa is a global energy and chemical company operating end-to-end in every stage of the oil and gas value chain and manufacturing products from raw materials of plant origin. It is also active in the renewable energy sector. Cepsa has 90 years of experience and a team of over 10,000 employees, combining technical excellence and adaptability with operations on five continents. The company started its digital transformation journey back in 2018 and soon became one of the first refining companies to tackle many of the industry challenges through cloud-native solutions. They selected AWS as the strategic cloud provider of choice, leveraging the breadth and depth of AWS services to innovate on new analytics and IoT solutions.

Cepsa was looking for a means to automate processes, derive new insights from data, and increase the agility and efficiency of its global operations. They started their digital transformation journey by building a production data lake on AWS to gather and centralize real-time data from sensors across their refineries.

Cepsa’s solution architecture

Cepsa implemented their production data lake using AWS managed services like AWS IoT CoreAWS Database Migration Service and Amazon Kinesis Data Firehose for data ingestion from sensors and on-premises databases, Amazon S3 and Amazon DynamoDB as the data stores, AWS Glue and AWS Lambda to implement data transformations and move data between stages, and Amazon Athena and Amazon API Gateway for data consumption.

By collecting, storing, centralizing, and democratizing access to the 170+ million data points produced daily by 300,000+ sensors, Cepsa built advanced visualization and analytical and machine learning use cases. Some examples are tracking operational trends, improving supply forecasting, enabling equipment predictive maintenance, and quickly identifying inefficiencies to reduce waste and energy use while increasing the output of refined products.

The downstream data lake and associated solutions enabled Cepsa to achieve the following key performance indicators:

  • 2 percent increase in energy consumption savings
  • 2.5 percent increased throughput
  • Predict malfunctions on complex equipment up to 45 days in advance

AWS Refinery Monitoring & Surveillance solution (RMS)

The AWS Refinery Monitoring & Surveillance solution (RMS) is inspired by Cepsa’s downstream data transformation journey. The RMS solution enables refining and petrochemical customers to modernize and augment operational, engineering, and business data acquisition, supervision, and visualization, by combining AWS qualified hardware, edge connectivity software, a cloud-native batch, and near real-time data store, and a visualization platform for various personas to consume. RMS creates the central foundation necessary for engineering and analytical applications to readily access datasets and drive high-value, business-critical workloads for market competitiveness.

RMS architecture

RMS is a modular solution, enabling the customer to choose the different native and partner components. This design provides implementation choices for data ingestion, operational dashboards, data visualizations, and analysis or machine learning models. The first critical choice to make is for the industrial connectivity component. The option provided for vendor-agnostic environments is a pattern established by AWS using partner edge application and AWS IoT SiteWise Edge gateway. ISV partner solutions such as AspenTech Cloud Connect and Emerson PlantWeb also enable refinery device connectivity for data ingestion as part of the RMS solution.

Edge connectivity and asset modeling

AWS IoT SiteWise Edge Gateway and partner edge connectivity applications enable customers to get data from their valuable assets into AWS. This asset data is represented in a simple, structured format so customers can more quickly realize the business value that is derived from that data.

The reference architecture includes the capability to convert customers’ existing asset hierarchy definitions stored in partner edge applications like Inductive Automation’s Ignition Server, PTC’s KEPServerEX, or Embassy of Things TwinTalk, and so on, to the equivalent asset hierarchy within AWS IoT SiteWise. This capability is enabled by the Asset Model Converter (AMC), a component of the edge connectivity reference architecture. The AMC is a serverless, module-based framework. It automatically converts definitions of a customer’s asset hierarchy (refinery, process units, and so on) from the partner edge application to equivalent definitions in AWS IoT SiteWise models and assets.

The AMC ingests asset-hierarchy-definition files or messages generated by the partner edge application and converts them into a schema compatible with AWS IoT SiteWise. It then automatically provisions a matching asset hierarchy in AWS IoT SiteWise. With this automatic mapping, application builders have immediate access to the customer’s asset hierarchy within a managed service (AWS IoT SiteWise) in the AWS Cloud.

After the hierarchy is defined in AWS IoT SiteWise, the partner edge application continuously ingests the asset data and transmits it to the AWS Cloud through an AWS Iot SiteWise connector within AWS IoT Greengrass. AWS IoT SiteWise serves as the hot-storage tier for both time-series data and metadata. All this data, including the metadata, is accessible to applications that can generate business value.

The edge connectivity framework includes:

  • Compatibility with leading industrial SCADA, PLCs, and Historians via partner edge connectivity application
  • Connectivity to collect, store, and process operational and technical data
  • Creation of virtual assets (for example, refinery, process unit, process equipment, sensors, and so on)
  • Visualization (near real-time) with AWS IoT SiteWise MonitorAmazon Managed Grafana, or partner solutions
  • AWS Qualified Hardware
  • Publication of data over various protocols such as HTTPS, MQTT (Message Queuing Telemetry Transport), and OPC Unified Architecture (OPC-UA)

Industrial data lake (IDL) for refineries

In addition to sensor telemetry data from the refinery floor, AWS customers can unlock value from enterprise relational databases and file-based documents. Data from maintenance and lab sample systems is crucial to model the complexity of today’s downstream refineries. This data is stored, structured, and secured within the industrial data lake portion of the RMS solution.

RMS provides Amazon S3 storage capabilities for the bulk of customers’ process and refinery data. Amazon RDS Aurora, along with the AWS Database Migration Service (AWS DMS), enables a simple mechanism for capturing ongoing data changes after an initial bulk load is complete. Additional asset metadata is stored in Amazon DynamoDB, the key-value and document database that delivers single-digit millisecond performance at any scale. AWS Glue crawlers identify changed data in S3 and structure the data for fast queries in Amazon Athena. All of these data sources are served up through Amazon QuickSight to provide a single pane of glass view for the overall effectiveness and performance of your refinery.

The RMS IDL includes:

By using the constructs provided by the IDL solution, contextual data from non-OT systems can be harnessed and joined to sensor data for a comprehensive view of your refinery operations. Maintenance history can be combined with telemetry patterns for predictive maintenance and other equipment health analyses. Anomalies and data quality issues can be quickly identified and remediated. Data trends across systems and units can be uncovered in an expedited and automated manner providing real-time insights for your business.


Cepsa’s downstream data transformation journey inspired AWS RMS solution, which enables refining and petrochemical customers to modernize and augment operational, engineering, and business data acquisition, supervision, and visualization. RMS solution design provides implementation choices for data ingestion path, operational dashboards, data visualization, and machine learning models. By doing so, it creates the foundation necessary for engineering and analytical applications to readily access datasets and drive high-value, business-critical workloads.

Establishing a cloud data lake foundation in AWS is just the beginning for refinery customers. RMS provides a powerful combination of robust AWS cloud services and a mature partner ecosystem. Implementing this solution lays the framework for additional insight and optimization of downstream facilities. AWS continues to help new and existing customers enable capabilities that drive bottom-line results. Contact us today to find out how we can help optimize your downstream business outcomes.

Scott Bateman

Scott Bateman

Scott Bateman is an AWS Principal Solutions Architect with over 25 years of technical experience in all segments of the energy industry. As a specialist in geospatial energy concepts, Scott works to define and build cloud-based solutions for energy & utilities customers to accelerate time to value on AWS.

Guillermo Menéndez Corral

Guillermo Menéndez Corral

Guillermo Menéndez Corral is a Solutions Architect at AWS Energy and Utilities. He has over 14 years of experience designing and building SW applications and currently provides architectural guidance to AWS customers in the energy industry, with a focus on Analytics and Machine Learning.

Krishna Doddapaneni

Krishna Doddapaneni

Krishna is an IoT Specialist Partner Solutions Architect with AWS, essentially helping partners and customers build crazy and innovative IoT products and solutions on AWS. Krishna has a Ph.D. in Wireless Sensor Networks and a Postdoc in Robotic Sensor Networks. He is passionate about ‘connected’ solutions, technologies, security and their services.