Tape Ark Unlocks Dark Data on AWS with Tape-to-Cloud Migration
By Guy C. Holmes, CEO and President – Tape Ark
By Dhruv Vashisth, Sr. Partner Solutions Architect – AWS Energy
By Dmitriy Tishechkin, Principal Partner Solutions Architect – AWS Energy
Oil and gas operators face significant challenges accessing the subsurface data they have created and saved on tape media.
Much of this data was recorded to tape more than 25 years ago on different tape types, ranging from 9 track tapes to LTO and 3592 cartridges. These massive collections are stored offsite in tape vaults, disconnected from the benefits of cloud computing.
Subsurface data is complex in nature and slow to access when stored on tapes. Furthermore, the cost of storing data in tape vaults on a cost-per-Gb basis can be very high compared to cloud storage costs. Data locked on legacy tape sitting in vaults slows business innovation and agility, and restricts the scalability offered by Amazon Web Services (AWS).
These collections of tape-bound data are often excluded from business workflows and decision-making processes and can be thought of as “dark” data. This means the data exists but is often invisible to the user and cannot benefit from new scientific algorithms and the scale of cloud computing.
With its beginnings rooted in the oil and gas industry, Tape Ark understands the complexities of liberating different types of tapes, including tape formats dating back to the 1960s and 70s.
The opportunities of working with subsurface assets, and how to safely and securely migrate the data to AWS is a core business for Tape Ark, an AWS Select Technology Partner.
Customers often have a vast collection spanning tens of thousands or even millions of tapes containing volumes of data that is measured in the tens of petabytes.
Being able to perform tape migration at scale is highly complex and resource intensive, but it’s essential for oil and gas companies that want to understand their collection so they can unlock the true value of their data. Tape Ark is powered by AWS and uniquely capable of undertaking mass migrations at large scale.
Migrating Data from Tapes to Cloud
Prior to cloud being available from AWS, oil and gas companies relied on tape to move data on and off boats, platforms, exploration vessels, and to joint venture partners. As oil companies tend to have long retention requirements, tapes are often moved to long-term archive in offsite storage facilities until the data may be needed again.
The entire oil and gas industry is geared to moving tape-bound data to higher density tapes every 3-5 years just to keep the data accessible. Because tape is not a random access storage medium, the ordering of data on tape is key to being able to locate data in a logical way once it’s needed.
Today, legacy seismic data in the energy sector is not being used widely, mainly because of the fact it is stored on tape media. By liberating and storing this data in the cloud, oil and gas companies can extract greater value by running artificial intelligence (AI) and machine learning (ML) workloads, including seismic interpretation and advanced processing.
Tape Ark’s solution has resulted in successful implementations to migrate data from tapes to cloud on AWS for oil and gas operators. A recent processing project was undertaken on AWS by an oil company that used 1 million virtual CPUs (vCPUs) to deliver results in record time—something that could never be done with data on tapes.
Currently, other Tape Ark technology implementations are being pursued involving tens of millions of tapes and over 400 petabytes of exploration data.
Tape Ark and AWS have developed a solution to liberate subsurface data to the cloud, making it available to analytics, machine learning, and collaborative workflows.
The high-level workflow solution starts by receiving media and performing a detailed tape media audit. This allows oil and gas companies to predict the cloud footprint they will create from their data at a granular level, seek out duplicates, and remove data for ingest that may be part of a joint venture (JV) or belong to a third party.
After audit, all data is ingested to the nominated cloud account using Tape Ark’s highly scalable technology stack. As data is ingested into a client’s account on AWS, automated checksum and name validation checks are carried out. These can be pre-prepared by the oil company so real-time ingest quality control can be performed in an automated way.
After ingest and automated quality control, data tiering policies automate the movement of data to the nominated tier as requested by the client. As part of the transfer, JSON metadata manifest files are created and placed into customers’ AWS accounts to update their internal databases.
Tapes completed can then be disposed of, using Tape Ark-certified disposal services or returned to the customer.
Figure 1 – Reference architecture.
As shown in the diagram above, the workflow solution starts from Tape Ark’s internal rapid mass tape ingestion platform Arkbridge, which was developed to utilize the strength and security of the AWS Cloud.
This internal system has resulted in a highly automated Internet of Things (IoT) approach that eliminates the need for any manual process to ingest media into AWS client accounts, providing high accuracy with efficiency. It uses Amazon API Gateway to execute AWS Lambda functions that store ingested tapes’ metadata in Amazon Relational Database Service (Amazon RDS).
The solution uses a unique combination of photograph processing with Amazon Textract and AWS IoT Core for managing tape reads, which has helped Tape Ark to simplify and streamline the process of tape ingestion at scale.
The subsurface tape data is converted into objects and stored in Amazon Simple Storage Service (Amazon S3), either via direct tape to cloud ingest or via AWS Snowball. Objects data gets stored in Amazon S3 Standard for faster accessibility and Amazon S3 Glacier for deep archiving.
Finally, the object data in S3 becomes the source for end customers’ seismic and machine learning applications.
This post explains the process and success story of Tape Ark in generating value from subsurface data stored on legacy tapes by ingesting and cataloging data to AWS.
Subsurface data in the cloud enhances productivity and time to insight from “dark” data to locate and access the right data at the right time. Oil and gas operators will be able to reanalyze data and apply AI/ML workflows, reconnecting their dark data to modern workflows.
Get in touch to find out how Tape Ark can identify, preserve, and migrate your tape-bound data to the cloud to maximize its value.
Tape Ark – AWS Partner Spotlight
Tape Ark is an AWS Select Technology Partner that is focused on mass migration of tape to cloud in addition to tape restore and eDiscovery services.
*Already worked with Tape Ark? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.