Tape Ark

Tape Ark and AWS invent an out-of-the-box archiving solution


Tape Ark is on a mission to help organisations to manage their backup data and ageing corporate documents by leveraging data analytics, artificial intelligence (AI) and machine learning (ML). As the world’s leading specialist in tape-to-cloud migration, having processed over five million data tapes to date, Tape Ark is now helping customers to take control of their paper based data.

For many industries, storing documents is a legal requirement of doing business, yet the document management sector – worth $529 million in Australia alone, according to IBIS World – is ripe for disruption.

“More often than not, companies tend to lose track of what’s inside each box. This makes it difficult to make decisions about box disposal or document scanning. As time passes and their archive box collection grows, the problem becomes even more unwieldy,” says Guy Holmes, President and Chief Executive Officer at Tape Ark.

One of Tape Ark’s customers, for example, has more than one million boxes in storage, and its collection dates back to the 1930s.

“We realised that storing paper documents in archive boxes is a pain point for many companies. In order to decide which documents to keep, digitise or destroy, they traditionally look to scan all of the documents, or bring the archive boxes back to their office for a subject matter expert to manually review every box. The default option is to do nothing, but this means forking out monthly warehousing fees for the foreseeable future. We were already using AI and ML in other areas of our business when we had the idea of using these tools to invent a smarter solution’ one where they can learn what is inside each box for a fraction of the cost” says Guy.

Hardware electronic circuit board. technology style concept semiconductor motherboard computer server cpu

The ProServe team introduced us to a new way of thinking, a suite of emerging AWS products, and were extremely collaborative… We don’t think there is a solution like the Rapid Box Indexer anywhere in the world.

Guy Holmes
President and Chief Executive Officer, Tape Ark

The key steps in Tape Ark’s machine learning journey: Discovery, Delivery, Scale

Tape Ark’s machine learning journey began with a discovery workshop in July 2020 led by the AWS Professional Services (ProServe) team. A dedicated team of data scientists, engineers and business experts came together to develop an ‘ML Blueprint’, outlining key steps and iterations. By December, Tape Ark was ready to deploy a proof of concept called the Rapid Box Indexer.

The Rapid Box Indexer allows organisations to view the contents of archived boxes from afar via Tape Ark’s customer portal. It is the first service of its kind, according to Guy.

“Until now, companies had to physically retrieve boxes from offsite warehouses in order to audit their contents. Each box is opened by a member of staff, who manually examines, documents or scans items before adding them into a spreadsheet or database. A single box, depending on its contents, can take more than an hour to manually index and document. Multiply that by tens of thousands —if not hundreds of thousands —of boxes, which is typical for large organisations, and you’re looking at endless hours of work,” says Guy.

To automate this process, the Rapid Box Indexer uses machine learning to index and record box contents. Intelligent Image and Video Analysis software are also used to categorise information and add metadata, providing much deeper insights than traditional indexing methods.

Unpacking Tape Ark’s suite of AWS AI and ML innovations

“The ProServe team were amazing,” says Guy. “They introduced us to a new way of thinking, a suite of emerging AWS products, and were extremely collaborative. It is great to work with a likeminded team who think big and really want to address the customer’s problem.”

Together, Tape Ark and ProServe built the Rapid Box Indexer using Amazon Textract, which uses machine learning to extract text, handwriting and data from virtually any document just like a person would. Amazon Rekognition is used to automate image and video analysis using machine learning and speech recognition software, while Amazon Comprehend enables entity detection and sentiment analysis. Amazon S3 is used to store and protect data, while Amazon Lambda enables very fast information processing.

Users can now ‘see’ inside each box using three layers of data – videos, images and text – via the Tape Ark portal, along with searchable tags. Armed with this information, it is much easier to decide which documents to keep, digitise, or destroy. They can also audit boxes long after video, images and text are captured – without recalling a single box.

“In our view, it’s like having all of your offsite boxes with you onsite so you can flip through their contents as needed.. This allows our customers to hone in on whatever they are hoping to find,” says Guy.

Scaling up: taking Tape Ark’s Rapid Box Indexer to the world

After developing a proof of concept in Australia, Tape Ark is now piloting the Rapid Box Indexer from its Houston facilities in the United States.

“We are currently using the Rapid Box Indexer to process 7,500 boxes, which is a reasonable scale for our first pilot. Customer feedback will help us to refine the Indexer and make it even more valuable. In 2021, we will start rolling it out in masse,” says Guy.

“Our aim is to expand the Indexer to meet the needs of any industry and answer some of the hardest document management questions; Can we prioritise documents for digitisation? Will this help to reduce the size of storing physical box archives? How will this reduce monthly warehousing costs?”

According to AWS comparisons, the Rapid Box Indexer can be up to 20 times cheaper than traditional archiving systems. This estimate is based on the costs of storing boxes in warehouses, which are “significantly higher than the costs of digitising and storing data in the cloud,” explains Guy.

About Tape Ark

Tape Ark is bringing the management of offsite, archive tape data into the 21st century by securely migrating ageing corporate data from tape media directly to the public cloud. By embracing digital and virtual data storage technologies, Tape Ark is re-imagining the way physical data is stored off-site, bringing physical tape storage into the new millennium.


  • Reduces physical warehousing costs up to 20x.
  • Provides three layers of data — text, image and video – users can see what’s inside each box, textually and visually, from afar.
  • Simplifies and automates low-value tasks like data entry and indexing to drive business efficiency.

AWS Services Used

Amazon Textract

Amazon Textract is a fully managed machine learning service that easily extracts printed text, handwriting, and data from virtually any document.

Learn more »

Amazon Rekognition

Amazon Rekognition makes it easy to add image and video analysis to applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use.

Learn more »

Amazon Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text.

Learn more »

Amazon S3

Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.