AWS for Industries

OSDU Data Platform on AWS – Ingestion Series #1: Overview of Data Types for OSDU Data Platform

OSDUTM (Open Source Data Universe) provides an open source technology-agnostic data platform for the energy industry. It’s driven by technology innovations for industrial data management with the goal to remove data silos that have often been witnessed by several industries over the last several decades. OSDU platforms support most datatypes found in the energy industry. Moreover, it’s open source and generic. This enables the integration of workflows, accelerates deployment, and helps improving better decision-making at a faster pace. To learn more about OSDU visit https://osduforum.org/. In this first series of posts, we’ll discuss commonly used business data types supported by OSDU in the context of oil and gas, and highlight how business data is internally mapped into OSDU.

As an example of a workflow that could use OSDU data standards, a complete life cycle of a reservoir includes reservoir exploration, subsurface logging, field development planning, drilling and completion, as well as production and operation monitoring. Each business segment generates large amounts and various formats of data. The data is stored in various types of dataset and each dataset has its own format and associated metadata.

A subset of dataset supported for subsurface projects are as follows:

  • Seismic data
  • Well Logging data
  • Wellbore data
  • Well Planning data

These datasets are defined by the developed well known schemas (WKS). These schemas are the common model which is the basis for data organization. The OSDU Data Platform uses JSON schemas to define the shape and format of the data ingested and cataloged in the system. When a new type or kind is defined, the system uses the JSON schema to make the definition permanent. Subsequent data is supplied to the system in the form of instances that conform to the schema of that kind. A more in-depth explanation of WKS is available here. The importance of having standards with the WKS is not only for efficient OSDU function, but also for interoperability between applications that will be ingesting the data from OSDU.

The metadata of the dataset is used to achieve an efficient ingestion and egression, as well as search and query capabilities. To understand how various business data types are supported on the data platform, we must review the data types (different than business data types) found within OSDU. The metadata can be divided into the following classes:

  • Master Data: Master data is associated with business requirement of data, and it’s a single source of basic business data used across various downstream applications or processes. For example, SeismicAcquisitionSurvey, SeismicProcessing, and Wellbore are master data describing seismic acquisition surveys, seismic processing, and drilled or planned boreholes respectively.
  • Work Product Component (WPC): This is the smallest independently usable unit of business data content transferred into the data platform. Each Work Product Component points to one or more data containers known as Files. For example, SeismicTraceData WPC points to segy file, which contains metadata about the trace data and digital information about seismic traces.
  • Work product (WP): This represents a group of work product components produced by a business activity and brought together to the OSDU platform for ingestion.
  • Reference Data: This is the set of permissible values for attributes to be used by other (master data or WPC/WP) data fields. During metadata ingestion, attribute values are validated against the reference values.
  • Dataset: This provides metadata about digital files and datasets. It doesn’t describe business content such as tracedata, log data, etc. that are found within the digital file and dataset, rather it stores information such as file size and checksum. Data containers referred to as Files contains digital business data. Datasets can be defined by a specific file format, such as seismic (segy), drilling (witsml), well log (las), etc., or the file can be of any type, such as file generic.

OSDU is evolving and the latest updates on schemas can be found in the data definitions.

Seismic data

OSDU supports seismic datasets collected from acquisition surveys, processed seismic data, and 2D or 3D interpreted seismic data. Seismic survey data is commonly acquired using a grid of source and receiver points to cover the area of interest for hydrocarbon exploration. Moreover, as shown in the following diagram, OSDU has the master data type SeismicAcquisitionSurvey for acquisition projects. During seismic surveys, multiple data are acquired roughly at the same location, and during seismic data processing seismic traces are grouped into a regularly spaced bin grid covering the survey area. OSDU has the master data type SeismicProcessingProject for processing projects. Similarly, OSDU has Seismic2DInterpretation and Seismic3DInterpretation master data, which refers to the projects interpreting geological features such as fault, stratigraphic unit, and horizon.

Furthermore, each of the master data types SeismicAcquisitionSurvey, SeismicProcessingProject, Seismic2DInterpretation, and Seismic3DInterpretation each refer to play type, hydrocarbon prospect, field name, basin name, and the organization name conducting exploration or production. In this way, OSDU can reference relevant business contexts to seismic datasets. With standard OSDU metadata ingestion schemas, any metadata belonging to a seismic dataset is ingested using master data, WP, WPC, and Files. In general, seismic data represents more than what has been displayed in the diagram. For better readability, the diagram is limited to some of the WP,WPC, and File types. An extensive explanation of all of the file types are documented here.

motor data

Figure 1: OSDU Seismic Model (not all WP/WPC and File Types are shown)

Seismic data WP/WPC:

  • SeismicTraceData: A single logical dataset containing seismic samples, SEGY, OpenVDS, and OpenZGY are supported formats using File.Collection.<formatType> dataset schema where the formatType is supported formats.
  • SeismicBinGrid: A representation of the surface positions for each subsurface node in a set of processed trace data work product components with common positions. The same bin grid can handle different sampling (spacing) and extents (ranges) in the trace data. The formats supported by OSDU are P6/11 and CSV.
  • SeismicLineGeometry: The 2D processing geometry of a 2D seismic line, and it refers to the Seismic2DInterpretation as master data. It stores the relationship between CMP (midpoint between source and receiver), X, Y, and station label. OSDU supports P1/11, P1/P90, and csv file format using File.Collection.Generic dataset schema.
  • SeismicHorizon: A set of picks related to seismic processing geometry which define a surface. The geometry used is associated with various interpretation WPC, and it refers to the interpretation Project. OSDU supports RESQML and csv formats.
  • VelocityModeling: Represents the velocity model and OSDU supports RESQML using File.Generic dataset schema. Other supported formats include SEGY, OpenVDS, and OpenZGY.
  • FaultSystem: A representation of a single fault picked based on seismic data. The record carries information about the seismic geometry context.

Well logging and wellbore data

Subsurface logging data requires petrophysical analysis using various petrophysical software. OSDU supports subsurface well data for well logs, well trajectories, and wellbore data. The following diagram explains how the OSDU schema is set up for well data. Within the master data, OSDU has well and wellbore datatypes, which reference play type, hydrocarbon prospect, field name, basin name, and organization name conducting exploration or production. Each wellbore type master data can have the following WPCs and would be referencing several reference data when ingesting data into OSDU.

  • WellboreTrajectory: This defines the wellbore path from the well surface location to the subsurface target location. OSDU supports CSV and WITSML formats.
  • WellLog: Well log data are defined using this schema, and it’s intended for digital well logs, not image or scanned well logs. OSDU supports WITSML, CVS, LAS, DLIS, and LIS file formats.
  • WellboreMarkerSet: This schema is used to define wellbore markers, also known as picks or formation tops. Markers identify the depth in a wellbore, at which a noteworthy observation is identified, such as change in the rock-type that intersects that wellbore. Formation Marker data includes attributes/properties that put these depths in context. Therefore, while defining Marker data, the schema would be referencing wellbore and well trajectory. Various seismic interpretation WPCs, such as HorizonInterpretation, StratigraphiColumn (sequence of sedimentary rocks) and Fault Interpretation, reference MarkerSet and WellBore Trajectory using OSDU schema (details not shown here). OSDU supports WITSML, CSV, and RESQML formats.

osdu well model

Figure 2: OSDU Well Model (not all WP/WPC and File Types are shown)

Conclusion

As outlined in this post, there are standards defined in the OSDU model that benefit the organization of large datasets. These standards allow for the minimal viable governance of data and make the data globally identifiable, consumable, and discoverable. The schemas can be continuously optimized and allow for the tracking of data lineage. With metadata extracted from each group of data, there is efficiency gained in searching for data and reducing duplicate data sets. A business context for each of the data assets gives more depth to the consumable data. In the next post, we’ll demonstrate how to ingest subsurface data into your AWS implementation of OSDU™ Data Platform. To learn more about how you can transform the core and build the future of your energy business, see AWS Energy.

Srihari Prabaharan

Srihari Prabaharan

Srihari Prabaharan is a Senior Cloud Application Architect and he works with customers to architect, design, automate, and build solutions on AWS for their business needs. Srihari's passion includes filmmaking and screenwriting and he made his debut independent feature film as writer and director in 2014.

Ajay Singh

Ajay Singh

Ajay Singh is a senior Energy consultant with AWS Energy & Utilities. He has 15 years of experience working with Oil & Gas upstream and downstream industry with focus on utilizing business data to build machine learning models and impact business decisions. He has published more than 25 technical papers and holds multiple patents. He enjoys doing DIY projects and spending time with family.

Anand Shukla

Anand Shukla

Anand Shukla is a Principal Cloud Architect with AWS Energy Practice. He is a hands-on architect with over 20 years of IT experience in software development and cloud architecture. He is involved in architecture, design and implementation of Microservices architectures and distributed systems utilizing modern cloud practices embodied with DevOps culture, and has previously worked at Microsoft, Avanade and multiple startup companies.

Colin Sturm

Colin Sturm

Colin Sturm is a Sr. Energy Consultant with AWS Power Energy and Utilities. He has 20 years of experience in petroleum/chemical engineering and working with oil and gas data. He has delivered innovative energy solutions that further organization objectives.