Enabling AI/ML workloads on WITSML data in the cloud with PDS WITSMLstudio
Guest authored by Eric Griffith, Digital Oil Field Division Manager at PDS
Oil and gas operators want their data scientists to run AI/ML workloads on Wellsite Information Transfer Markup Language (WITSML) data to extract more value from it. WITSML is an oil and gas industry data exchange standard in widespread use for delivering real-time and historical drilling and wellsite data, but popular AI/ML tools do not directly support it. Data scientists lose valuable time in retrieving, preparing, and feeding the data into their tools. When processing real-time WITSML data, they lose even more time setting up and maintaining the data feeds. Creating a real-time data pipeline that transforms data into AI/ML-friendly formats like JSON and delivers it into AWS shifts the focus back to the science in data science. Powerful AI/ML tools built on these pipelines can help drilling engineers save hours of work in interpreting well, wellbore, mud logs, and trajectory information efficiently thus reducing decision-making time.
There are several steps involved in collecting WITSML data, transforming it to JSON and making it query-able and consumable by AI/ML tools. PDS WITSMLstudio StoreSync and WITSMLstudio StoreAdapter applications combined with AWS services support and help automate the process, minimizing the need for human intervention.
PDS WITSMLstudio StoreSync and StoreAdapter are used in combination with each other to query and collect the WITSML data in real time and transform it to a JSON format, which is stored in an Amazon S3 bucket. These applications support reliable and robust WITSML data retrieval from all the major WITSML data providers and can automate much of the process for several of the data providers. Once the data is stored in Amazon S3, an automated process built on Amazon S3, AWS Lambda, and AWS Glue kicks off to process the JSON files, making them query-able based on their metadata. Once this data is available in the desired format, it can be consumed by other applications through Amazon Athena or AI/ML models through Amazon SageMaker. The entire pipeline can be tuned to customize the data that is delivered and the format it is delivered in.
Figure 1. Reference Architecture
WITSML data gets generated on the wellsite and sent to WITSML server. WITSML server is the source for the WITSML data. This may be a WITSML server running at the wellsite, hosted centrally by a service company or data provider, or an internal server hosted by an oil company. The rest of the solution is on AWS, including WITSMLstudio StoreSync and StoreAdapter running on Amazon EC2.
Figure 2. WITSMLstudio StoreSync
PDS WITSMLstudio StoreSync acts as a “man-in-the-middle” data synchronization application. It is a .NET-based Windows Service. It queries data from a source server and writes it to a destination server, both typically WITSML servers. StoreSync is designed to gracefully resume data transmission where it left off in various failure scenarios like application crashes, machine reboots, connection failures, and so on. All or only specific data can be selected to be queried and sent. It also supports standardizing and normalizing the data before it is written to the destination, for data to be written in the expected organization, naming, and units. StoreSync provides automation features that, depending on the capabilities of the source server, can be used to automatically discover and transfer all data, active data, or some combination of both that also matches certain naming patterns.
PDS WITSMLstudio StoreAdapter acts as a WITSML “gateway” to or from other data stores. It is a .NET-based IIS Web Application. In this scenario, it receives WITSML data from StoreSync, transforms it to JSON format and writes it to Amazon S3. The JSON output format can be customized, and other formats like .CSV or the original XML are possible. Other destinations like Amazon Kinesis are also supported. StoreAdapter can also be used with non-WITSML data sources like OPC, process historians, and SQL databases to expose data as WITSML so that StoreSync and StoreAdapter can be used to deliver the data to Amazon S3.
Once the transformed data is stored in the Amazon S3 bucket folder, an event notification triggers an AWS Lambda function that converts JSON to JSON lines and stores in Amazon S3 bucket to make it query-able through Athena. Lambda function can also segregate JSON lines files and store them in Amazon S3 bucket’s structured folders based on the data e.g. well, wellbore, trajectory, if needed be. AWS Glue is used to generate metadata tables from JSON data files. Query execution on Amazon Athena uses metadata tables to retrieve data from the S3 bucket. To enable AI/ML workloads, Amazon SageMaker’s Jupyter notebook is set up to run queries on Amazon Athena. PyAthena, a DB API-compliant client for Amazon Athena driver, needs to be installed on the notebook to support queries.
Figure 3. Sample of Amazon Athena querying data from Amazon S3 bucket using AWS Glue metadata
By streamlining the heavy lifting of delivering WITSML data in an AI/ML-friendly format to AWS, WITSMLstudio StoreSync and StoreAdapter let end data consumers and application builders stop wrestling with data ingestion and start extracting value out of the data. Application delivery times can be shortened by eliminating the need to implement the ingestion step and by providing ready access to data in real time for development and testing. Delivering timely insights from real-time data to drilling engineers can avoid costly and harmful mistakes, reduce the manpower needed to deliver wells both at the wellsite and in the office. It also improves the quality of delivered wells leading to sustained financial and HSE gains. For more information about StoreSync and StoreAdapter please contact Eric Griffith (email@example.com) or visit the WITSMLstudio homepage at pds.group/witsmlstudio.
Eric Griffith is the Digital Oil Field Division Manager at PDS. Eric has a PhD in Computer Science. He has been with PDS since 2008 delivering solutions to the upstream oil and gas industry with a focus on real-time drilling data since 2013 and WITSML since 2016. Eric is heavily involved with defining the next generation of the WITSML and ETP standards from Energistics. He still has a soft spot for gaming and enjoys travel with his family.