AWS Storage Blog

Trellix accelerates on-premises research data migration with AWS Snow Family

Trellix is a global company redefining the future of cybersecurity. The company’s open and native extended detection and response (XDR) platform helps organizations confronted with today’s most advanced threats gain confidence in the protection and resilience of their operations. Trellix’s security experts, along with an extensive partner ecosystem, accelerate technology innovation through machine learning (ML) and automation to empower over 40,000 business and government customers.

Trellix had a set of research data stored on-premises and risked losing access to the data if it was not migrated within an approved time frame. They needed to move the data outside their existing on-premises environment and they chose AWS as the destination for this migration, aligning with their data center exit strategy. Trellix’s Advanced Research Center used AWS Snow Family to seamlessly migrate threat repository data from an on-premises data source within the data center exit timeline while minimizing operational challenges and security risks. The data that was migrated was research data, which serves as a cornerstone for Trellix’s product efficacy, ML, and innovation workstreams.

In this post, we explore Trellix’s journey to move their datasets into the cloud. We delve into the use of Snow Family for data migration and its storage in Amazon Simple Storage Service (Amazon S3). Trellix was able to transition data from a data center to a cloud-based storage solution, offering scalability and cost efficiency.

Challenges and requirements

Trellix’s research engineers use specific datasets to develop their threat detection platform with diverse ML models. In the past, this data was stored in traditional network-attached storage (NAS) arrays located on-premises. However, Trellix had an urgent task to transfer their data from the on-premises data center in four months, otherwise they would lose all access to the stored data. This endeavor posed significant challenges. First, the outdated infrastructure made it difficult to migrate the data within the given time frame. Additionally, the migration process was hindered by the presence of numerous small files, the lack of suitable networking, and storage solutions that could facilitate the timely completion of the migration.

Trellix chose an offline data migration path due to bandwidth limitations. The goal of the data migration was to achieve a cost effective migration and reduce operational expenses. Additionally, Trellix wanted control over the migration process. The solution needed to be user friendly, minimizing the need for extensive operational efforts. It was crucial to demonstrate high reliability, availability, and resiliency. To optimize storage costs, Trellix wanted to use lower cost options such as archive storage. The data needed to be stored in the cloud, maintaining a directory-like file structure to preserve the existing hierarchy throughout the migration process.

Discovery and planning

With prior experience using Amazon S3, Trellix recognized the scalability, reliability, and cost-effectiveness of the storage solution. They made the decision to migrate their datasets to Amazon S3, considering the access patterns for their data as they determined the appropriate storage class to use. This information facilitated the migration planning process for both AWS and Trellix.

Upon successful feasibility analysis and technical validation, Trellix ultimately chose to proceed with the offline migration strategy using Snow Family.

Data migration and implementation

From lifting and shifting workloads to moving entire data centers, AWS provided the organizational, operational, and technical capabilities needed for a successful migration. Collaboratively, the AWS team assisted Trellix engineers in setting up and integrating the Snow device into their network, as well as facilitating the process of installation and connection. AWS Snowball and AWS Snowcone are purpose-built devices that cost effectively move petabytes of data, offline from on-premises or rugged, mobile environments. Refer to the Snowball documentation to learn more.

Trellix used the Snowball Edge Client to unlock and manage the device while leveraging the Amazon S3 adapter with the necessary AWS CLI version for testing data transfer from their NAS server onto the Snow device. The AWS team closely supported the Trellix engineers by conducting shared screen sessions, facilitating discussions on fundamental configurations, and assisting with advanced setups aimed at enhancing data upload performance on the device.

During the testing phase, Trellix encountered an issue related to data uploads, particularly with a larger number of small files. The dataset consisted of files with an average size of 1.4MB, requiring batching to improve data transfer performance back to Snowball Edge. The AWS team collaborated with Trellix engineers to explore methods for batching small files using a Python script. The most recent version of this tool is available on GitHub.

Once Trellix overcame this obstacle and achieved success, the team proceeded to conduct performance bench marking. Based on the positive outcomes, Trellix made the decision to transition its active proof of concept(PoC) job into production. Trellix continued to upload production data into the initial Snow device while working simultaneously with the AWS team and the Snow Family service team to plan the migration of the entire dataset.

As a result of the collaboration, the Snow Family service team increased Trellix’s service limit from the default of one device to four devices. Trellix returned its first device while simultaneously connecting two new devices to the network. The Trellix team initiated data uploads to these new devices using two streams to avoid overwhelming NAS performance. Trellix observed the offloaded data from the initial Snow device, and over the next two months successfully migrated the entire 400TB of data into Amazon S3.

Considerations

The migration involved considering performance, cost, and scale requirements.

Number of jobs

After a thorough evaluation, AWS and Trellix determined that four additional Snow jobs were needed in addition to the PoC stage. To guarantee uninterrupted performance for the ongoing workload on the existing NAS system, Trellix approved the concurrent connection of two Snowball Edge devices.

Batch small files

To speed up the process of transferring small files into the Snowball Edge device, the AWS best practice is to batch them together in a single archive. Once small files are batched in one of the supported archive formats, they can be auto-extracted when they are imported into Amazon S3. For a complete walk through of the tool, leverage the video Data Migrations with the snow-transfer-tool and blog Migrating mixed file sizes with the snow-transfer-tool on AWS Snowball Edge devices.

S3 Lifecycle

An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. Using S3 Lifecycle configuration, you can define rules to expire objects after a certain time period or transition objects from one storage class to another. Trellix used S3 Lifecycle policies to transition data into Amazon S3 Glacier Deep Archive storage class. This approach further contributed to optimizing storage costs for Trellix’s data stored in Amazon S3.

Monitoring Amazon S3 Storage usage

Trellix used Amazon S3 Storage Lens, to enhance visibility into object storage usage and activity after transitioning data from Snowball Edge devices to Amazon S3. With S3 Storage Lens, Trellix gained visibility into the total data stored in a bucket and the amount of storage added to the specified S3 bucket and prefix path. By using Amazon S3 Storage Lens, Trellix effectively monitored the usage of Amazon S3 storage after transitioning the data.

Conclusion

Trellix’s successful migration to AWS met the short data center exit timeline of 4 months due to the offline migration capabilities of the AWS Snow Family. The seamless integration and ease of use allowed Trellix to focus on core business objectives while moving data into Amazon S3 Glacier Deep Archive to achieve significant cost savings. Explore more about Trellix and other customer success stories for offline data migration and edge compute on the Snowball customers page.

Tanuj Maheshwari

Tanuj Maheshwari

Tanuj is a Senior Software Development Manager with expertise in cyber security and cloud security, with over 15 years of experience. Over the last decade, he has been working with Trellix, where he actively shapes security strategies and delivers innovative SaaS solutions. His focus is on product design and development, enhancing product security, and contributing to the company's success and resilience in the ever-evolving landscape of cybersecurity.

Ananta Khanal

Ananta Khanal

Ananta Khanal is a Solutions Architect focused on Cloud storage solutions at AWS. He has worked in IT for over 15 years, and held various roles in different companies. He is passionate about cloud technology, infrastructure management, IT strategy, and data management.

Jesse Bieber

Jesse Bieber

Jesse Bieber is a Solutions Architect with a focus on storage services at AWS. Jesse has over 20 years of technical experience across enterprise, commercial and government sectors with an engineering background focused on data center infrastructure, implementation and management of compute and storage, and cloud migration strategies. He has a passion for music and playing the guitar.