AWS Storage Blog

Protect and manage Dell EMC PowerScale data on Amazon S3

Many customers across several industries use Dell EMC PowerScale to store various kinds of files over SMB, NFS, and HDFS protocols. Common workflows include building and analyzing data lakes, content production creation, genomics sequencing, and image rendering.

Implementing and maintaining an efficient and secure data backup for Dell EMC PowerScale in the cloud is necessary for many PowerScale users. This may be for compliance and regulatory requirements, or simply to preserve and protect content. Superna Golden Copy helps customers protect and manage their data on AWS, and can help customers protect and backup Dell EMC PowerScale to meet regulatory compliance while enhancing the value of their data.

In this blog, we explore the key benefits of protecting data on Dell EMC PowerScale using Superna’s Golden Copy and Amazon S3. We also review technical recommendations for installing and configuring Superna Golden Copy for this use case. This solution removes the requirements for a duplicate NAS cluster, power, cooling, and associated resources.

Amazon S3 and Superna

Amazon S3 provides customers a cost-optimized, efficient, scalable, secure, and managed object storage service. Today, AWS powers millions of businesses in over 190 countries around the world and Amazon S3 is the ubiquitous storage solution for storing data.

Superna is an AWS Partner, and with the Superna Golden copy tool, PowerScale customers can back up their file data to object data and perform data recall. Golden Copy maintains full metadata protection and continuous incremental support for long-term retention to Amazon S3. Superna Golden Copy with AWS can help customers protect and backup Dell EMC PowerScale, meet regulatory compliance, and enhance their data.

Now, let us take a deeper look into how this works.

Golden Copy methodology and approach

Golden Copy is a Dell EMC PowerScale integrated file to object copy tool. The Golden Copy backup and recall platform simplifies moving files to object storage for long-term retention, backup, or archive. The product provides a complete solution to simplify file to object backup and syncs workflows from file to object and object back to file for PowerScale. This solution includes scale out copy performance with small file and large file optimizations. Two product options exist today:

  1. Golden Copy Base: Backup feature set with copy and sync features.
  2. Golden Copy Advanced: Backup use case features, integration with Ransomware Defender, and automation workflow API enhancements.

Superna Golden Copy - Secure deployment model using Amazon VPN or AWS Direct Connect with secure S3 bucket policies

Note: Using AWS PrivateLink is optional but highly recommended to connect to Amazon S3 from on-premises or in AWS using private IP addresses in your Amazon Virtual Private Cloud (VPC). This eliminates the need to use public IPs, configure firewall rules, or configure an Internet Gateway to access Amazon S3 from on premises.

This secure deployment model using AWS VPN or AWS Direct Connect with secure S3 bucket policies allows customers to continue using their data center for all day-to-day activities while using Amazon S3 as a secure cloud repository for backup data.

A few AWS services are used in this solution, however additional services can be leveraged to enhance the value of your data:

  • Amazon Elastic Compute Cloud (Amazon EC2) for your (graphical user interface) GUI dashboard of Superna Golden Copy (planned for Q4 2021).
  • Amazon S3 for data storage.
  • Customers can copy data from on premises to AWS in a variety of ways, including over AWS Direct Connect or Amazon VPN.

Now, let’s focus on the setup of Golden Copy, which includes discovery and analysis, restricting access to a specific VPC endpoint, and data recall.

Discovery and analysis

First, define your source PowerScale shares and target Amazon S3 buckets. Log in to your Superna Golden Copy and add a source PowerScale cluster. Once logged in, execute the following commands with the AWS Command Line Interface (CLI):

searchctl isilons add --host <ip address of Isilon in system zone> --user eyeglassSR --applications {GC, SR}

Use the following option to add a cluster for a redirected recall job. This cluster type is available with the backup bundle or via an upgrade to the advanced license key:

[Advanced backup bundle license key required] [--goldencopy-recall-only]

The following parameter is required to assign the cluster to the search application, or the Golden Copy application. For the Golden Copy product, enter GC.

[--applications APPLICATIONS]​

To list the PowerScale clusters, run the following command:

searchctl isilons list

Figure 1: PowerScale cluster configured within Golden Copy

Next, configure your Amazon S3 bucket destination:

searchctl backupdfolders add --isilon prod-cluster --folder /ifs/data/policy1/aws --accesskey AKIAIs3GQ --secretkey AGV7tMlPOmqaP7k6Oxv --endpoint s3.ca-central-1.amazonaws.com --region ca-central-1 --bucket mybucketname --cloudtype aws
  • NOTE: The Region is a mandatory field with Amazon S3.
  • NOTE: The endpoint must use the Region encoded URL. In the preceding example, the Region is ca-central-1 and is used to create the endpoint URL.
  • See how to configure an Amazon S3 bucket.

Once you’ve set up the source and target, add folders to a backup via the CLI:

searchctl archivefolders add –isilon <cluster name> --folder <folder name> --accesskey <access key> --secretkey <secret key> --endpoint <endpoint> --bucket <bucket name> --cloudtype other

Once you’ve set up the source and target, add folders to a backup via the CLI (1)

Figure 2: How to add a folder to a backup

The following commands exclude or include certain data types:

  • Exclude everything in the user’s appdata profile:
--exclude ‘/ifs/home/*/AppData/**’
  • Only back up docx and pdf files, and exclude everything in a tmp directory:
--include ‘*.pdf,*.docx’ --exclude ‘/ifs/data/home/tmp/**’
  • Only back up docx, pdf, and bmp files:
--include ‘*.pdf,*.docx,*.bmp’
  • Back up all files except those in AppData:
--exclude ‘/ifs/home/*/AppData/**’

Executing the following commands allows you to see which folders have been set up as backup folders:

searchctl archivefolders list

Figure 3 - How to list all backup folders (1)

Figure 3: How to list all backup folders

With the source, destination, and backup folder configured, you can now choose to run either a test or a backup. If you choose test first (recommended), Superna Golden Copy will complete the following validations:

  1. PowerScale to Amazon S3 connectivity test (port test).
  2. File creation on the cluster path /ifs/goldencopy/temp/test-folderId/testfile test.
  3. File upload to Amazon S3 target test.
  4. Verify file copy to Amazon S3 target.
  5. Deleting /ifs/goldencopy/temp/test-folderId/testfile from the cluster.

With the folders set up, you can begin the copy process:

searchctl archivefolders archive –id <id number>

Figure 4 - How to begin a copy (1)

Figure 4: How to begin a copy

If you choose to back up the folder, Superna Golden Copy will begin copying files to Amazon S3. During any copy operations, Golden Copy can perform md5 checksums on copied files to ensure full data integrity. You can create different backup policies in Golden Copy for different departments, for example. This is useful especially when a central IT department is managing data for different business units and wants to set different policies for each unit.

Understanding and monitoring the folder-backup copy operation can be critical. The Superna Eyeglass dashboard monitors statistics on all folders and shows real-time updates of cumulative file count and files copied per second.

Figure 5 - Superna Eyeglass reporting dashboard for Golden Copy

Figure 5: Superna Eyeglass reporting dashboard for Golden Copy

Customers can also view statistics on the CLI using the following command:

CLI> searchctl archivefolders stats --id <folder ID>

Figure 6 - Example stats report (1)

Figure 6: Example stats report

One of the key differentiators for Superna Golden Copy is its ability to do an incremental backup job. Golden Copy runs an on-demand snapshot based on the PowerScale changelist API to detect created, modified, and deleted files since the last job, mirroring these changes to the Amazon S3 target. This means that Golden Copy does not require a full file-system scan and can quickly copy the changed files over to Amazon S3.

Once Golden Copy has completed a full copy, customers can create an incremental schedule for any folder. This gives you the ability to run an incremental schedule based on requirements to meet any Recovery Point Objective (RPO) or Recovery Time Objective (RTO).

Golden Copy ensures the preservation of both file and folder metadata during backups to Amazon S3. You have the choice of restoring metadata when recalling the files and folders with Golden Copy. See the following screenshots for metadata examples using Cyberduck:

Figure 7 - Cyberduck interface showing file metadata retained within the object

Figure 7: Cyberduck interface showing file metadata retained within the object

Figure 8 - Cyberduck interface showing folder metadata retained within the object

Figure 8: Cyberduck interface showing folder metadata retained within the object

Restricting access to a specific VPC endpoint in the S3 bucket policy

The following Amazon S3 bucket policy allows access to a specific bucket, DOC-EXAMPLE-BUCKET2, from endpoint vpce-1a2b3c4d only. The policy denies all access to the bucket if the specified endpoint is not being used. The aws:sourceVpce condition is used to specify the endpoint and does not require an Amazon Resource Name (ARN) for the VPC endpoint resource; only the endpoint ID is required. Replace DOC-EXAMPLE-BUCKET2 and vpce-1a2b3c4d with a real bucket name and endpoint.

{
  "Version": "2012-10-17",
  "Id": "Policy1415115909152",
  "Statement": [
    { "Sid": "Access-to-specific-VPCE-only",
      "Principal": "*",
      "Action": "s3:*",
      "Effect": "Deny",
      "Resource": ["arn:aws:s3:::DOC-EXAMPLE-BUCKET2",
                   "arn:aws:s3:::DOC-EXAMPLE-BUCKET2/*"],
      "Condition": {"StringNotEquals": {"aws:sourceVpce": "vpce-1a2b3c4d"}}
    }
  ]
}

Data recalling

Data recall is equally important when backing up your data. With Golden Copy, you can easily recall data and apply metadata as part of that process to any specified directory. The progress of a restore is available in the GUI dashboard.

searchctl archivefolders recall - - id <id of folder> --subdir <subdir> --apply-metadata <optional if you want to restore metadata>

Figure 9 - Restoring data to a different folder

Figure 9: Restoring data to a different folder

Figure 10 - Restoring data to a different folder and monitoring via the GUI dashboard

Figure 10: Restoring data to a different folder and monitoring via the GUI dashboard

Additional considerations

Golden Copy will copy files in their native format. As files land in Amazon S3, authorized users can immediately access these files as objects. This offers many use cases for customers such as being able to use this data as the foundation for a data lake. With a data lake you can query and explore data using services such as Amazon Athena, catalog data with AWS Glue, process data with Amazon EMR, and more. The various use cases provide customers with a way to unlock the value of their data in ways they may have never considered before.

Customer benefits with Superna Golden Copy and AWS

There are numerous advantages customers can benefit from when using Superna Golden Copy and AWS:

  • Customers can take advantage of the cost benefits and simplicity of Amazon S3.
  • Use and access the data with applications running in EC2 or Amazon FSx.
  • Fast incremental data syncs to Amazon S3 because Golden Copy has built-in integration with the PowerScale snapshot changelist that offloads detection file system changes to PowerScale.
  • Golden Copy preserves PowerScale file metadata – owner, group, mode bits, and folder access control lists using the PowerScale REST API. Data movement between Amazon S3 and PowerScale will not cause a loss of any metadata.
  • Golden Copy leverages features in Amazon S3 to compute a checksum to validate the data integrity of files stored as objects.
  • Golden Copy encodes the Message-Digest 5 (MD5) algorithm as a property on objects to allow validation during recall operations. This is useful for audits to compare MD5 checksums after a recall operation.

Additional customer benefits of Golden Copy include:

  • Cluster-based license – unlimited nodes and unlimited copy jobs per cluster. This allows you to back up and recall unlimited amounts of data without concerns of license costs.
  • Bandwidth rate limiting ensures that production workflows using the same internet link are uninterrupted while backup or recall workflows are running in conjunction.
  • Sync mode, copy mode, or both, gives you flexibility in your workflows and processes.

The “Delayed Delete” feature protects deleted data for recovery, ensuring you can restore files in case they were accidentally deleted.

Conclusion

In this post, we showed you how Superna Golden Copy can be used to back up and recall files between Dell EMC PowerScale and Amazon S3. We discussed how this process can help you save on storage costs and allow the use of additional services such as Amazon Athena, Amazon EMR, and AWS Glue. We also mentioned how Golden Copy can help meet Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) with its built-in integration of the PowerScale API, eliminating resource intensive storage scans.

Thanks for reading this blog post! If you would like to learn more about Golden Copy and schedule a free trial, visit the Superna Eyeglass training page. If you have any questions or feedback, feel free to leave them in the comments section.

Bhavesh Lad

Bhavesh Lad

Bhavesh Lad is a Media & Entertainment Principle Storage Solutions Architect for AWS. Bhavesh has over 17 years of experience in information technology and over 8 years in Media & Entertainment. At AWS, Bhavesh focuses on helping Media & Entertainment (M&E) accounts architect, adopt, and deploy cloud storage. Prior to AWS, he held roles as the Business Development Manager for M&E, Global Advisory Systems Engineer for M&E Accounts, and Manager of Technology for an Animation Studio.

Boni Bruno

Boni Bruno

Boni Bruno is a Principal Architect and Workload Specialist at AWS. He enjoys developing solution-driven architectures and sharing informative content to the AWS Storage and Analytics community. Prior to AWS, he was the Chief Solutions Architect for Dell Technologies’ Unstructured Data Solutions Division, where he built numerous solutions around big data processing, storage, machine learning, analytics, and various HPC applications.