AWS Partner Network (APN) Blog

Manage Your Business Complete Data with OpenText InfoArchive and AWS 

By Nikhil Enmudi, ISV Solutions Architect – AWS
By Jessica Ho, Sr. Partner Solutions Architect – AWS
By Chavi Gupta, Partner Solutions Architect – AWS


Post-pandemic, the world has fast-tracked data capture and consumption among enterprises. According to an IDC report, the Enterprise DataSphere will grow more than twice as fast as the Consumer DataSphere over the next five years, putting even more pressure on enterprise organizations to manage and protect the world’s data while creating opportunities to activate data for business and societal benefits.

More than ever, enterprises need economical and effective storage solutions to manage this vast amount of data.

Turning such a large amount of raw data into insights has been cited as one of the most challenging problems by executives. A Forrester report found that while 74% of companies say they want to be “data-driven,” only 29% say they are good at connecting the data to action. That’s a big gap between storage of data and making it useful.

In this post, we will demonstrate how OpenText InfoArchive on Amazon Web Services (AWS) can help you take control of your “business complete data,” which refers to data that is no longer changing and needs to be kept for compliance purposes.

When deployed on AWS, customers have access to serverless technologies to integrate their existing data-generating systems while enriching data with artificial intelligence (AI) and machine learning (ML)-powered insights along the way.

OpenText is an AWS Partner and leader in the 2021 Gartner Magic Quadrant for Content Services Platforms. A global enterprise information management company, OpenText helps organizations manage and gain value from their business content.

Data Archival with OpenText InfoArchive

When it comes to business complete data, customers have two main options: 1) delete the data, or 2) retain the data for future use.

In regulated industries, customers are not allowed to simply delete data, as they must be kept for multiple reasons. These may include financial, legal, or medical records that must be kept in a secure, immutable, compliance management system with strict access control, auditing requirements, and e-discovery, for example.

This may also be data that is being migrated from one platform to another, either through an upgrade process or the decommissioning of systems that are at their end-of-life shelf support. There could be an inability to keep up with security and maintenance concerns, such as patching and upgrading of the platform.

OpenText InfoArchive is a modern archive solution and cloud-based service for compliant archiving of both structured and unstructured information that is highly-accessible, scalable, and economical. It’s a centralized platform which enables flexible storage options for unstructured content, including storage on Amazon Simple Storage Service (Amazon S3).

InfoArchive complies with Open Archival Information System (OAIS) and ISO 14721 to ensure data is future-proofed. Archived data uses XML documents to maintain data integrity without vendor lock-in.

Flexible connectivity to data and document stores integrates with OpenText File Intelligence to archive ungoverned file shares and other repositories. InfoArchive can also serve as active archiving and long-term preservation for OpenText Content Suite Platform and Extended ECM Platform.

In addition, data is available through InfoArchive’s Rest API to support integration with other OpenText products such as Magellan, third-party enterprise systems such as RISE with SAP, or additional AWS services like AWS Step Functions.

OpenText InfoArchive provides a web-based presentation of data to their customers with highly contextual access to search and review.


Figure 1 – OpenText InfoArchive data search.

As shown in the figure above, the search view includes content metadata, audit data, and governance policy data, such as retention policies and legal holds.

In addition to document search and review with data rich features, OpenText InfoArchive provides customers with an easy-to-use interface for managing governance policies, which are directly on the content objects residing in Amazon S3.

Figure 2 shows the retention policy create/edit view with options to configure the aging strategy of a content. There are also options to configure a policy approver so the policy can be validated and approved by a stakeholder before it’s applied to a content.


Figure 2 – Create/edit retention policy in OpenText InfoArchive.

Below, Figure 3 shows the compliance dashboard provided by InfoArchive to give customers a holistic view of their data, retention policies, and application retention coverage.


Figure 3 – OpenText InfoArchive compliance dashboard.

OpenText InfoArchive on AWS

InfoArchive Cloud Edition on AWS is offered as customer-deployed or as a managed solution by OpenText running on AWS.

OpenText InfoArchive on AWS is deployed on Amazon Elastic Kubernetes Service (Amazon EKS) for hosting its web application, OpenText Directory Service for authentication and authorization, and the InfoArchive server. The Amazon EKS cluster is deployed in a private subnet across two AWS Availability Zones (AZs) to ensure high availability of the application.


Figure 4 – OpenText InfoArchive on AWS architecture using Amazon EKS.

Amazon S3 is used to store structured and unstructured binary data. Amazon Elastic File System (Amazon EFS) is used as a persistent storage for the EKS cluster. Amazon Relational Database Service (Amazon RDS) is used to store metadata of the binary content which is then utilized to perform search operations. Amazon RDS provides elasticity, security, scalability, and fault tolerance over multiple AZs.

OpenText InfoArchive has deep integration with Amazon S3, supporting features such as intelligent tiering and object lock. Integration with these services provides persistent storage and data archiving across the various S3 storage classes. It also allows you to implement your data lifecycle and governance policies for specific business, organizational, and regulatory compliance requirements directly on objects residing in S3.

Data Enrichment and Enhancement with AWS services

With OpenText InfoArchive on AWS, your data management plan easily grows with your organization. This includes the growth of your dataset, demand for access, demand to consume data, and metadata in other systems for insights.

As shown in Figure 5 below, while content is the preparation stage you can utilize services such as Amazon Textract, Amazon Transcribe, Amazon Rekognition, and Amazon Comprehend to enrich your data with insights prior to sending to InfoArchive. For up-to-date information on AI/ML technologies available for use, visit the machine learning on AWS page.

Customers can utilize services like Amazon Textract, an AI/ML service that extracts text from handwritten documents, forms, and other documents that were previously scanned and stored as images. For audio and video content, customers can extract text using Amazon Transcribe, while Amazon Rekognition can be used for extracting textual content inside of videos, as well as extracting insights from photos or images that make up your overall dataset.

Now that you’ve extracted texts from your content, you can further enrich your dataset with services like Amazon Comprehend, a natural language processing (NLP) service that derives valuable insights and sentiments from within your documents. These insights include information like people, places, and sensitive information, with localized support.


Figure 5 – Pre-processing stages of data before loading in OpenText InfoArchive.

Customers have the ability to use serverless technologies such as AWS Step Functions, AWS Glue, and AWS Glue DataBrew to extract, transform, and load (ETL) their business complete data from existing systems prior to loading it into InfoArchive.

AWS Step Functions allows customers to build visual workflows to orchestrate interactions and data-flow concerns among applications and services in and out of AWS. Step Functions help coordinate the data-flow process for moving content from your existing system into InfoArchive while enriching data en route.


Figure 6 – AWS Glue DataBrew data profile.

The figure above displays a DataBrew data profile with statistics such as the number of rows in the sample and the distribution of unique values in each column. As data is staged in Amazon S3 for processing, customers can use AWS Glue and DataBrew to analyze and update metadata based on any mapping rules they may have.

AWS Glue is a serverless data integration service for the discovery and cataloging of data, so it can be prepared for movement into another system such as InfoArchive. DataBrew is a visualization data preparation tool that gives you the ability to prepare your data with custom rules, or use presets related to personally identifiable information (PII) detection and handling.


In this post, we walked you through the OpenText InfoArchive offering on AWS, which provides customers with a rich set of features and services to take control of their business complete data. We also talked about the AWS services which support the InfoArchive cloud managed service offering from OpenText, and shared a reference architecture for customers who plan to deploy InfoArchive in their own AWS environment.

Lastly, we showed how customers can integrate various AI/ML and analytics services in a pre-processing flow to extract meaningful insights from the content to enrich and enhance the data before loading into InfoArchive.



OpenText – AWS Partner Spotlight

OpenText is an AWS Partner and global enterprise information management company that helps organizations manage and gain value from their business content.

Contact OpenText | Partner Overview | AWS Marketplace