Intelligent rig operations classification with HITL on AWS

In the oil and gas industry, rig reports are essential for monitoring the performance and operations of drilling rigs. These reports contain a vast amount of unstructured data, including comments from rig personnel regarding rig activities such as drilling progress, equipment maintenance, process interruptions, and safety observations (Figure 1). In blog #1, we discussed a scalable solution that can extract data from rig reports and convert unstructured PDF reports into structured databases. In this blog, we will discuss how to use the extracted data from these reports for machine learning (ML) purposes—specifically, rig operation code conversion from one operator’s code to another operator’s code using text classification.

The purpose of logging daily rig operations is twofold: first, to have a record of what operations have been performed, and second, to learn from historical data to improve future rig operations and optimize rig performance, reduce downtime and carbon emissions, and improve process safety. To analyze daily rig operational data, various operations are classified into primary and secondary codes, which are specific to each individual oil company. In the case of joint venture programs, oil companies need to convert rig operation codes from each other daily. Traditionally, these operation codes are converted manually, which is a tedious, time-consuming, and costly task and can lead to delays in identifying and addressing critical issues that could impact rig operations. A self-adaptive ML model to perform automated code conversion from one operator to another can transform the way oil companies traditionally perform the same task. In this blog post, we propose an intelligent rig operations classification solution built on Amazon Web Services (AWS).

Figure 1. Sample of drilling operation summary log (commonly known as a time log)

For data extraction from rig reports, the solution uses Amazon Textract—an ML service that automatically extracts text, handwriting, layout elements, and data from scanned documents. (More details on scalable data extraction are available in blog #1.) Next, the solution uses Amazon Comprehend—a natural language processing (NLP) service that uses ML to uncover valuable insights and connections in text—to automate the classification of comments on rig reports, facilitating rapid and accurate analysis of rig operations. (Read more about analyzing reports with NLP here.) Finally, the solution sends these results to Amazon Augmented AI (Amazon A2I), which allows you to conduct a human review of ML systems to guarantee precision, to initiate a human-in-the-loop (HITL) workflow for manual review of predicted classes with low confidence scores. Manual corrections are stored and can be used to train new Amazon Comprehend models to improve accuracy in the next training cycle. This way, even if limited labeled data is available to train the Amazon Comprehend model, oil companies can use the proposed solution to automate data extraction and can achieve classification accuracy close to 100 percent after several training cycles and use this accurate data to generate valuable insights into rig operations, helping operators optimize drilling processes and improve rig performance.

Solution architecture

Figure 2. Architecture diagram for the drilling operation classification inference pipeline using drilling reports

The high-level architecture (Figure 2) for the data extraction and classification inference pipeline consists of several AWS services. First, the rig report PDFs are added to the input bucket in Amazon Simple Storage Service (Amazon S3), an object storage service offering cutting-edge scalability, data availability, security, and performance. Then, in the data extraction step, Amazon Simple Queue Service (Amazon SQS)—which provides fully managed message queuing for microservices, distributed systems, and serverless applications—initiates Amazon Textract using a TextractSubmission lambda function. Once Amazon Textract finishes its job, it sends a message to Amazon Simple Notification Service (Amazon SNS), a fully managed Pub/Sub service for A2A and A2P messaging. This message triggers a PDF2Json lambda function, which is configured to extract relevant information from the document (such as the time log table from a drilling report) and stores this data in an Amazon S3 bucket. Moving the data to Amazon S3 triggers another Amazon SQS queue to manage the Amazon Comprehend classification job as a batch process. Next, in the classification step, the extracted data is passed to a ClassificationJobSubmission lambda function, which performs an Amazon Comprehend inference job using a pretrained Amazon Comprehend custom classifier. The solution uses AWS Systems Manager Parameter Store—a capability of AWS Systems Manager that provides secure, hierarchical storage for configuration data management and secrets management—to store which model to use for inference. This step helps classify the extracted information into predefined categories.

After classification, in the validation step, a RulesEngine lambda function uses a confidence score threshold to determine if the classification results require further human validation. If the confidence score is lower than the threshold, the comments are dispatched to Amazon A2I for human validation. Alternatively, if the confidence score is higher than the threshold, the data can be sent directly to Amazon S3 for further processing.

Next, a FlattenA2IOutput lambda function combines the results obtained from Amazon A2I human validation with the automated validation process, performing an extract, transform, load (ETL) operation to flatten the output from Amazon A2I. Finally, an ExportA2IOutput lambda function is used to update time log tables with converted rig operation classification codes, which can be put into Amazon QuickSight—a service that powers data-driven organizations with unified business intelligence (BI) at hyperscale—to build dashboards, providing visualizations and insights for further analysis and decision-making. These dashboards can be integrated with oil companies’ in-house applications.

Amazon Comprehend custom classification

To use Amazon Comprehend for training the custom model, the first step is to prepare the data by creating a labeled dataset. This dataset can be obtained from a database where the business already has labeled comments and corresponding classes. The labeled dataset is then used to train a custom classification model using the custom classifier feature of Amazon Comprehend. The custom classifier feature facilitates the creation of a model specific to users’ needs, using their own labeled data. Users can select the language of the text and the categories for classification—in this case, rig activities such as drilling, tripping, and circulation. In that situation, the trained model can then be used to classify the remaining comments on rig reports.

To evaluate the accuracy of the custom model, the user can use the built-in evaluation tool in Amazon Comprehend, which provides metrics such as precision, recall, and F1 score. Overall, the Amazon Comprehend custom classification feature provides a flexible and scalable solution for training custom models for specific use cases, such as classifying comments on rig reports. We will discuss Amazon Comprehend model retraining in the next sections.

Amazon A2I

After the initial training of the ML pipeline using Amazon Comprehend, it is important to continue retraining the model to improve its accuracy over time. One effective way to achieve this is by using HITL validation through Amazon A2I. A RulesEngine lambda function implements Amazon A2I to allow human reviewers to validate classification results using a custom web user interface, which provides instructions for reviewing and a link to the actual document for quick review (Figure 3b). Based on Amazon Comprehend prediction confidence, the classified comments with lower confidence scores are routed for human review. An example is shown below (Figure 3b) where each comment (remark) is routed for human review. In this case, we are showing time log entries in a batch of five comments per review. The first comment was wrongly classified, which can be fixed by changing the “Category” and “Activity” fields during the human review process. These corrections are stored in Amazon S3 and then fed back into the training dataset to improve classification accuracy. As more and more data are processed and correctly classified, with retraining, the model will become increasingly accurate and will require less human validation over time. This continual feedback and human review help to verify that data accuracy remains as close to 100 percent as possible, promoting accurate business decisions, helping to improve the model’s predictive power, and adapting to changing data patterns and trends.

Figure 3a. Time log table as found in a drilling report

Figure 3b. Amazon A2I HITL user interface to correct classification results for drilling operations

Model retraining

The model retraining pipeline begins with a schedule in Amazon EventBridge, a service that helps users build event-driven applications at scale across AWS, existing systems, or software-as-a-service (SaaS) applications. This schedule initiates the pipeline (Figure 4), triggering a data preparation lambda function, which uses a container image stored in Amazon Elastic Container Registry (Amazon ECR), a fully managed container registry offering high-performance hosting. The container image contains the necessary data preparation scripts, and it is used to create a processing job in Amazon SageMaker, a service that is used to build, train, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows. This processing job prepares the data for training. Once the processing job is completed, a message is sent to Amazon EventBridge, which in turn triggers a training lambda function. The training lambda function initiates an Amazon Comprehend training job, which begins the process of training the model.

Figure 4. Amazon Comprehend model training pipeline

Business impact of the solution

The intelligent drilling report classification solution can have a significant impact on the oil and gas industry. By automating the classification of comments on drilling reports, the solution can provide valuable insights into rig operations, including identification of potential issues and performance optimization.

Two of the key benefits of the solution are improved safety and reduced nonproductive time (NPT). By automatically identifying comments related to these classes, companies can quickly analyze which operational changes are leading to unfavorable events and can take corrective action to avoid costly downtime, damage to equipment, and loss of life. In addition, by identifying issues early, companies can proactively implement measures to prevent future incidents.

Another important benefit of the solution is cost reduction. By automating the classification of drilling reports, companies can save time and resources that would otherwise be spent on manual data processing. This saved time can free up employees to focus on more valuable tasks, such as analyzing the insights gained from the data. In addition, by optimizing rig performance, companies can reduce operational costs, such as fuel and maintenance expenses, which can have a significant impact on the bottom line.

Furthermore, the solution can increase efficiency by providing near real-time insights into rig operations. By quickly identifying potential issues, companies can take corrective action before they become larger problems. This ability can help companies optimize rig performance, reduce downtime, and increase production. The solution can also help companies identify areas where they can improve processes and procedures, leading to even greater efficiencies over time.

Overall, the intelligent rig report classification solution can have a transformative impact on the oil and gas industry. As a result, companies that implement the solution are likely to gain a competitive advantage in the market, leading to long-term success and growth.

Conclusion

In conclusion, the implementation of the intelligent drilling report classification solution on AWS can have a significant impact on the oil and gas industry. With the help of NLP and ML technologies, the solution can provide valuable insights into rig operations and can identify potential issues before they become major problems. By retraining the ML model with the help of HITL validation, the solution can continually improve its accuracy and relevance over time. The benefits of the solution include improved safety, reduced costs, increased efficiency, and better operational performance. Overall, the intelligent drilling report classification solution on AWS can be a game-changer for the oil and gas industry and can pave the way for a more automated and efficient future. For example, in blog #3, the authors will describe how to use generative artificial intelligence (AI) to search for information in thousands of drilling reports.