AWS Storage Blog

Analyze and tier on-premises NAS data to AWS with Komprise

As companies store larger and larger amounts of unstructured data, they may find that their IT is not always capable of keeping up with their pace of data growth. For instance, they may be inefficiently and expensively storing unstructured data by storing both frequently and infrequently accessed data on the same NAS device, at the same cost. Even with storage taken care of, they may be unable to take full advantage of their data, in terms of analytics and other use cases. Komprise, an AWS Advanced Tier Partner, offers a powerful platform that helps you take control of your data. Komprise enables you to analyze existing on-premises file shares and object stores to provide meaningful insight into how data is stored and used. It also enables you to take action to transparently tier or migrate data as needed – all within a single console.

In this blog post, I provide an overview of how Komprise can help you transparently tier data from on-premises Network File System (NFS) and Server Message Block (SMB) shares into Amazon S3. I also provide a walkthrough of how this can be accomplished in the Komprise UI. Using Komprise’s Intelligent Data Management, you can save on costs by tiering colder data to Amazon S3 while leaving the source NAS (network-attached storage) shares intact. This allows customers to reduce their on-premises file share footprint and save on costs by retiring and consolidating NAS hardware or by removing the need to purchase additional capacity on existing NAS hardware. I touch on more of the benefits of the Komprise platform and solution throughout the blog post.

Komprise Intelligent Data Management

IT departments are becoming increasingly hesitant to continue expanding their on-premises NAS storage footprints to keep up with the rapid data growth that they have been seeing. Expanding these on-premises NAS systems requires capital, additional rack space, power, cooling, and an addendum to existing support or maintenance contracts.

Enterprises still have a business need to expand their storage footprints to manage their rapid data growth, but with a lower TCO. Many are looking at ways that they can take advantage of the cost benefits of AWS Storage, such as lower upfront costs and only paying for storage that they use. However, one of the key challenges with this approach is that it can be difficult to refactor existing applications to make use of object storage. Additionally, customers that make use of NFS/SMB shares for home directories and user/team shares often find it burdensome to migrate users to a different NAS device or share. The Komprise platform offers a suite of tools called Intelligent Data Management that enables you to define rules based on business logic to determine what data is consider hot or cold. For instance, you can transparently tier colder data to Amazon S3 from any NFS or SMB file share based on when users last accessed a file. You can do all this while users and applications can continue to access files in their original location.

A key benefit of Komprise is that it works across multiple NAS solutions. Not tied to a single vendor solution, Komprise operates at the file level regardless of the source as long as it is NFS or SMB-compliant. This enables customers with a variety of NAS vendors, including NAS shares residing on a single server (for instance, Windows file shares), the ability to analyze their data and perform data management operations. You can also perform these operations against multiple shares from a single pane of console. Let’s take a deeper look into how this works.

Komprise Intelligent Data Management concepts

The following concepts are key components of the Komprise Intelligent Data Management solution. Once a customer has deployed their Observer and Windows proxy virtual appliances, and deployed and set up their Director UI, they can begin discovering, analyzing, and managing their NAS shares with Komprise.

Observers

Customers deploy “Komprise Observers” in their on-premises environments, close to their data storage. Observers are virtualized appliances deployed in Open Virtual Appliance (OVA) format on VMware or KVM hypervisors. These appliances are responsible for analyzing your on-premises NFS shares, tiering data to Amazon S3, and providing transparent access to files that have tiered in Amazon S3. Depending on the number of NFS shares and files, you may need to deploy multiple appliances, which can scale horizontally to provide increased parallelism and performance when needed.

Director

The “Komprise Director” is the administrative interface console of the Komprise solution. You can choose to deploy a Director on premises as a virtual machine, or in AWS as an Amazon EC2 instance. They can also have Komprise host the Director and provide access to the UI as a SaaS service. At no point must a customer log into or administer observers – the Director UI console handles all of the tasks required.

Deployed Observer and Windows Proxy virtual appliances and deployed and set up Director UI - Manage NAS shares with Komprise

Windows proxy

Customers must deploy Komprise appliances when using Komprise to manage SMB shares. Komprise proxies are virtual appliances that are joined to their Active Directory domain. These appliances are responsible for analyzing your on-premises SMB/CIFS NAS shares, and assisting Observers with tiering SMB/CIFS data to Amazon S3. Depending on the number of SMB/CIFS shares and files, you may need to deploy multiple Komprise appliances, which can scale horizontally, increasing parallelism and performance where needed.

Let’s walk through an example setup where I define NAS sources and Amazon S3 targets. I also discover and then define the business logic in a plan to transparently tier cold data to Amazon S3.

Discovery and analysis

You must define your source NAS shares and target Amazon S3 buckets. First, login to your Komprise Director and add some source NAS shares. Once logged in, browse to Shares, then Sources, then select Add File Server. For this example, I assume that you’ve already created an S3 bucket and a programmatic IAM user with the appropriate IAM credentials to access it.

Customers must define their source NAS shares and target Amazon S3 buckets

You can then enter in your NFS or SMB file share information and begin discovering the files that reside in that share. In this example, I’m adding an NFS share on a Linux Server in my test environment.

You can then enter in your NFS or SMB file share information and begin discovering the files that reside in that share

Once you have added all of our source NAS Shares, we can now add Amazon S3 buckets as Targets. Click the Targets tab and Add Cloud Target to add an S3 bucket.

Once you have added all of our source NAS Shares, we can now add Amazon S3 buckets as Targets

Enter your S3 Bucket Name, IAM user credentials (Key ID and Secret Key) and the desired Display Name. In this example, you are setting up the target bucket to send files to the Amazon S3 Standard (S3 Standard) storage class. However, you can also use the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class for files that you do not frequently access. Furthermore, you have the option of adding an S3 Lifecycle policy to move these objects to Amazon S3 Glacier once they from Amazon S3 after a time you can determine.

Please ensure that you review the Amazon S3 pricing page to select the most appropriate S3 storage class supported for your workload and retrieval pattern. Keep in mind that the cost of object storage is just one dimension that you should consider when optimizing storage.

(1) Enter your S3 Bucket Name, IAM user credentials (Key ID and Secret Key) and the desired Display Name

Once we have set up our sources and targets, you can define a plan and configure its data policy. The data policy has specific criteria that defines when Komprise will transparently tier files that you or users in your organization have not accessed for a defined period to Amazon S3. Komprise also provides an estimate of potential space and cost savings over a three-year period.

Komprise also provides an estimate of potential space and cost savings over a three-year period

You also have the ability to edit the cost model used in the Plan Analysis (preceding screenshot). You can customize the analysis to ensure accurate cost savings estimates by browsing to All Actions and going to Edit Cost Model when viewing your plan:

You can customize the analysis to ensure accurate cost savings estimates by browsing to All Actions, Edit Cost Model when viewing your plan

Once you are done editing your plan, you can choose to either test or activate your plan. If you test your plan first (recommended), Komprise generates a list of files that it would have moved from your selected NAS shares to Amazon S3, without actually moving them. If you choose to activate the plan, Komprise begins moving files to Amazon S3 that meet the criteria defined in the plan.

During any copy or move operations, Komprise performs md5 checksums on your files during these operations to ensure full data integrity during the data transfers. A single plan can span multiple NAS servers, even from different vendors. You can create different policy groups in Komprise for different departments, for example. This is useful especially when a central IT department is managing data for different business units and wants to set different policies for each unit.

Additional considerations and benefits of Intelligent Data Management

When using Komprise to tier cold data to AWS, a file is tiered from a source NAS share to Amazon S3, and Komprise leaves behind a symbolic link to the file in its original source location. When a user or application attempts to read or write a file that you have moved to Amazon S3, they access the file using the symbolic link. The link points to a Komprise Observer, which tracks where the file is stored. The Observer retrieves the file from Amazon S3 and fulfills the read/write request for the file within seconds. This way, users and applications can continue to access these files in the same location without refactoring applications to use object storage or specifying a different file share location.

Komprise also supports Amazon S3 buckets configured with S3 Object Lock, which enables you to store objects using a Write-Once-Read-Many (WORM) model. This can be particularly attractive for customers who are concerned about the modification or deletion of files that reside on Amazon S3, for example, for compliance and regulatory purposes. This may also help protect against ransomware or malware incidents that look to infect NAS shares.

By default, Komprise moves files in their native format with their contents unchanged. This means that as files are moved to Amazon S3, you can now choose to access these files as objects natively within Amazon S3. This opens a number of new and exciting scenarios for customers. For example, customers can use this data as the foundation for a data lake, with the potential to query and explore data using services such as Amazon Athena. They can also catalog data with AWS Glue and process data with Amazon EMR, amongst other things. The possibilities are endless – and many customers are unlocking the value of their cold data in ways that they may have never considered before.

In November 2020, Komprise also announced support for AWS Outposts as an AWS Outposts Ready Partner, enabling customers with AWS Outposts to tier or migrate NAS data to and from AWS Outposts. This is helpful for customers that have a need for low-latency access to AWS services. It is also useful for those customers that have specific data-residency requirements while using a consistent set of AWS services and features with a consistent set of APIs.

Komprise Deep Analytics

You can also choose to make use of Komprise Deep Analytics (included with Komprise Intelligent Data Management) to query file metadata data across all of their sources. This can help you understand exactly how they are using their storage. In the following example, I have run a search for any spreadsheets that have been accessed more than 2 months ago across all of my file shares. Komprise enables you to drill down and filter results even more granular than this if so desired. This can be extremely helpful for IT departments looking to understand how users are using and accessing their files, and can facilitate data-driven strategies and tactical decisions.

Komprise Deep Analytics (included with Komprise Intelligent Data Management) to query file metadata data across all of their sources

Conclusion

In this post, I discussed how you can use Komprise to transparently tier files from on-premises NAS shares into Amazon S3, and how this can them save on storage costs. Apart from costs, with data in AWS, you can leverage advanced functionality in AWS. You can use services such as Amazon Athena, Amazon EMR, and AWS Glue to help unlock the value of your cold data and use it as a foundation for a data lake, all while end users and applications are able to read/write to data on the source NAS share without interruption. If you would like to learn more about Komprise and schedule a free trial, please visit Komprise’s AWS page.

Thanks for reading this blog post! If you have any comments or questions, please don’t hesitate to leave them in the comments section.

Anthony Fiore

Anthony Fiore

Anthony Fiore is a Sr. Storage Specialist Solutions Architect with Amazon Web Services. Anthony works with customers to solve complex storage challenges, provide guidance on AWS storage best practices, and leading community service events with Amazon in New York City. Outside of work, he enjoys cheering on his son at soccer games, and perfecting his Bolognese sauce recipe.