AWS for Industries

Discovering mining data with Elsevier Geofacets on OSDU Data Platform

Overview

Knowing what data you have at your disposal and quickly finding and accessing it has become an essential first step for digital transformation in the energy industry. The Open Group OSDU Forum aims to reduce organizational data silos by facilitating collaboration on the OSDU Data Platform that helps break the barriers to innovation and democratize data usage across all aspects of the energy value chain. One of the key guiding principles of the OSDU Data Platform is its extensibility and being able to add new data types in support of different business workflows.

Elsevier and Amazon Web Services (AWS) have collaborated on extending the OSDU Data Platform to support the mining industry. With the new developments, it is now possible to upload, search, and retrieve mining data. The newly developed modules allow converting Elsevier Geofacets metadata, publishing it into the OSDU Data Platform, and retrieving the data for future consumption and analysis through the platform’s core application program interfaces (APIs).

Elsevier Geofacets

Elsevier Geofacets’ solution provides actionable insights for energy, critical minerals, and other natural resources. The metadata associated with the maps, tables, and graphs is extracted and validated. This facilitates easily finding the data and integrates it with the business-driving workflows. However, the backend store for this data and metadata has historically been proprietary. The existing APIs have provided accessibility to the data, but taking it one step further and using the open-source data platform helps unlock the mining data and its applications more broadly.

As OSDU is gaining momentum across the energy value chain, AWS and Elsevier set the goal to illustrate how it is possible to add new schemas and extend the core OSDU Data Platform to support the most frequently used mining-industry data. New schemas needed to be created to accurately capture metadata from files, documents, tables, and graphs. The ingestion data pipeline also needed to be enhanced, and ingestion scripts needed to be developed to parse, extract, and store the mining metadata in the OSDU Data Platform using the OSDU Core services APIs.

New OSDU mining schemas

The overall conceptual architecture of the OSDU Data Platform with the new OSDU mining schema extensions is shown in figure 1. GeoWorkspace schema was created for the ingestion of drawing format (*.DWG), Geo TIFF (.TIF, *.TIFF), and ASCII Grid (.GRD) data types. In addition, four schemas were created for an article and its extracted subcomponents such as maps, tables, and figures. All records were created of the type work-product-component using the OSDU template, https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/SchemaRegistrationResources/shared-schemas/osdu, and follow the standards for JSON schemas, https://json-schema.org/. Schemas build on top of each other and inherit their properties from other schemas. The custom properties of the schemas were added in the Individual Properties object of the Data property. After adding the name and versioning in the header, the new schema can be uploaded to OSDU through the REST API PUT request of the OSDU schema service. Python scripts for the GeoWorkspace and article-ingestion workflows were also developed. These scripts are responsible for pulling out the metadata from the source data and constructing the appropriate records in the OSDU Data Platform.

Figure 1 The OSDU Data Platform conceptual architecture with new mining-industry data schemas

Figure 1. The OSDU Data Platform conceptual architecture with new mining-industry data schemas

The functionality described above was set up on the secure and reliable implementation of the OSDU Data Platform from AWS. The newly developed extensions to support mining workflows take full advantage of Amazon Simple Storage Service (Amazon S3), an object storage service, and Amazon DynamoDB, a fully managed, serverless, key-value NoSQL database. Schema definitions and its metadata are stored in Amazon DynamoDB, which automatically scales up and down depending on the usage of the OSDU Data Platform. This allows OSDU Data Platform on AWS to not only quickly adapt to new customer requirements but also handle metadata changes and their size quickly and without compromising performance. When querying the metadata from Amazon DynamoDB, it provides fast access to items by specifying primary key values. It can also be further optimized with one or more secondary (or alternate) keys, and this leads to much more flexible query patterns for the interrogation of the metadata. This becomes especially relevant when extending the existing OSDU schemas at scale to support new workflows, such as the ones for the Elsevier Geofacets application.

The underlying articles, figures, maps, and tables are stored as objects in Amazon S3, which provides the necessary scalability, availability, security, and performance to retrieve the data and deliver it to the application layer. Whether it is an article attachment, a high-resolution image of the area, a map related to the mining activities, or a SEG-Y seismic file, the flexibility of Amazon S3 allows you to store nearly any type and amount of data that you want.

Proof of concept

Using the newly created mining schemas and data ingestion mechanisms, Elsevier developed a proof of concept and strategy to implement a geospatial query from Elsevier Geofacets user interface against the OSDU Search service. The mock-up of the Elsevier Geofacets connected to the OSDU Data Platform extended with mining schemas is shown in figure 2. This proof of concept illustrates how a user can search a section of a map by drawing a box and seeing the results for that location from both the Elsevier Geofacets rich-data repository together with the connected OSDU instance.

Figure 2 The OSDU Data Platform conceptual architecture with new mining industry data schemas

Figure 2. The OSDU Data Platform conceptual architecture with new mining industry data schemas

Conclusion

The Open Group OSDU Forum is focused on building and advancing a standards-based, technology-agnostic data platform to help transform and facilitate the energy industry in addressing the world’s ever-evolving energy needs with data management and data analytics. OSDU Data Platform extensibility facilitates companies to effectively work and advance the OSDU platform. The Elsevier and AWS effort highlighted how OSDU can be extended into the completely new industry in an accelerated manner. In absence of OSDU, a similar effort would have taken at least 6 or more months. With OSDU, it took less than 2 months to complete the integration. The next steps are to work with the Open Group OSDU Forum and explore the contribution of the developed mining schemas to the forum. This effort also demonstrates how independent software vendors (ISVs) can interact and work with the OSDU Data Platform and advance the platform further into the new areas.

Yuriy Gubanov

Yuriy Gubanov

Yuriy Gubanov is a Senior Partner Solutions Architect at Amazon Web Services specializing on Energy Data Platforms, including OSDU Data Platform. Yuriy has worked in the energy industry for nearly two decades architecting, implementing and delivering innovative IT solutions for the engineering, geoscience and data management communities. He is an avid cloud computing enthusiast and is always looking for new ways to design and influence the energy systems of the future.

Christine Rhodes

Christine Rhodes

Christine Rhodes is a Senior Partner Development Specialist for Energy Data Platforms, OSDU. Christine has 16 years' experience in the energy industry working in Petrotechnical Data Management and Project/Program Management, and currently spends her days at AWS tackling industry challenges by diving deep and bringing forward innovative solutions from our growing ISV community. She is passionate about change management and how data and relationships can support the energy transition across the industry.

Dmitriy Tishechkin

Dmitriy Tishechkin

Dmitriy Tishechkin is Principal Partner Technical Lead, Energy, Amazon Web Services. Dmitriy has over 20 years of experience of architecting and delivering enterprise solutions to customers, and 15 years spent in Energy industry. For 4 years with AWS Dmitriy has been working with partner community to build, migrate, and launch their Exploration and Production workflows on AWS. Dmitriy is interested in renewable energy and reducing carbon footprint technologies.

John Skero

John Skero

John Skero is a Director of Product Management, Elsevier, Geofacets. He is a GIS expert and has studied in Earth Science and Geology. In his current role, he helps Geofacets with data integration, standards, and workflows within the energy industry. He worked as a Geoscience Specialist for ExxonMobil for a decade, as well as Woodside Energy, and EcoPetrol. In his current role at Elsevier (9 years) he is leading the integration of spatial data, metadata enrichment, generating analytics into easy-to-use workflows and systems.