AWS Storage Blog

Accessing your file workloads from on-premises with File Gateway

Our customers tell us that they face many challenges for their on-premises file workloads. Growing infrastructure costs, storage capacity limits, upcoming warranty renewals, insufficient data protection, and a never-ending cycle of hardware refreshes have them considering new approaches. When looking at replacement solutions in the cloud, there are often concerns of latency and ability to preserve the end-user experience. With Amazon FSx File Gateway, you have another tool in your toolbox to help replace on-premises file servers and move file sharing workloads to AWS.

For hybrid and edge deployments, AWS offers AWS Storage Gateway for on-premises access where latency and bandwidth are an issue. AWS also offers three fully managed file services, including Amazon EFS, Amazon FSx for Windows File Server, and Amazon FSx for Lustre.

In this blog, I discuss the use cases best served by Amazon FSx File Gateway and the challenges it helps to overcome by keying in on the main features of the service. I go over some key differences between the two types of file gateways, Amazon S3 File Gateway and Amazon FSx File Gateway. I then review some important architectural aspects to consider when getting started to prepare you for a successful deployment of Amazon FSx File Gateway.

Use cases

When looking to migrate an on-premises Network Attached Storage (NAS) environment to AWS, there are a number of things to consider, including:

  • What type of clients do you use?
  • Where are your clients?
  • What applications are running on your clients?
  • How can you protect your data?

I focus on Windows-based workloads in this blog. For Windows workloads running on AWS, customers often choose Amazon FSx for Windows File Server for highly reliable, performant shared Windows storage in the Regions closest to their data centers. But what if their SMB clients are too far away from the Region hosting the Amazon FSx file system?

So, we are stuck with a predicament. On one hand we want to get out of managing physical NAS infrastructure, whether that is a large multi-petabyte (PB) NAS system or a moderately sized Windows file server. On the other hand, how can we preserve that experience our users are accustomed to? Is there a way to offload these systems, reduce management overhead, and gain all the benefits of the cloud? Yes, there is. Amazon FSx File Gateway.

Why Amazon FSx File Gateway?

The Amazon FSx File Gateway extends Amazon FSx for Windows File Server to any site with an internet connection. It provides a scalable local cache, up to 64 TB, for low latency access to most recently used files. By deploying an Amazon FSx File Gateway within your data center or remote and branch offices, your Windows clients are able to connect over the LAN. As Amazon FSx File Gateway is a local cache of most recently accessed data backed by an Amazon FSx file system, it looks like a local file server to users and applications.

I would also like to highlight a few key benefits of the Amazon FSx File Gateway:

  • As a Native file system, it supports the full complexity of years’ worth of inherited NTFS permissions of your current on-premises NAS infrastructure.
  • Low latency is provided to your applications by a local cache tier for most recently used files. The local cache is scalable up to 64 TB per gateway. The local cache is kept in sync with the in-cloud Amazon FSx file system automatically at the desired automated cache refresh interval. Automated cache refresh intervals can be set between 5 minutes and 30 days, to sync changes from the Amazon FSx file system to the Amazon FSx File Gateway. The Amazon FSx File Gateway uploads new data in near-real time to the Amazon FSx file system in the cloud.
  • Comprehensive data protection for your data, as you are able to leverage both Shadow Copies and backups of your Amazon FSx file system. Shadow Copies provide scheduled snapshots of an Amazon FSx file system at a point in time, which offers quick recovery point creation and restore capabilities. Backups can integrate with AWS Backup to store out of band and long-term retention backups in Amazon S3. Backups are file-system consistent, highly durable, and incremental.
  • Optimized cloud connectivity by serving 80% or more reads from the local cache, mitigating data transfer out charges, improving performance for your applications, and limiting bandwidth consumption. Writes from SMB clients are acknowledged directly by the local gateway, and uploaded to the cloud Amazon FSx file system in the background. When making small edits to files, the gateway saves additional bandwidth by uploading only the changed data within that file. When reading a large file that is not in the local cache, the gateway fetches only the requested bytes of data and intelligently pre-fetches data. This limits the bandwidth and improves the end-user experience. The gateway employs a highly optimized SMB implementation that performs actions using a minimum number of round-trips to the cloud. This is key since SMB is normally a very “chatty” protocol that performs poorly over a WAN.
  • Security is provided by Amazon FSx for Windows File Server as data is encrypted in transit and at rest. Encryption of data in transit is supported on file shares that are mapped on a client that supports SMB protocol 3.0 or newer. Encryption of data at rest is automatically enabled when creating an Amazon FSx file system.
  • High availability is achieved in the backend with the Amazon FSx file system handling synchronous replication between two physical data centers (assuming Multi-AZ). This provides high availability and multiple copies of data. On the front end, high availability of the gateway is provided with VMware integration where heart beats and health checks are built in through VMware tools.

Amazon S3 File Gateway vs. Amazon FSx File Gateway

Since the Amazon FSx File Gateway is being introduced as a new gateway type, let’s spend a minute going over the key differences between this new gateway type and our traditional Amazon S3 File Gateway.

Amazon S3 File Gateway is great when used as a repository for machine-generated data, as a way to ingest vast amounts of data in Amazon S3, and as a way to enable you to leverage all of the benefits of Amazon S3 such as S3 Lifecycle policies, S3 Versioning, and S3 Replication. The S3 File Gateway is optimized for large files, and was designed to provide file-based access (NFS or SMB) to your S3 buckets, which may contain billions of objects. S3 File Gateway is the optimal choice for workloads such as database backups, archives, and ingestion into data lakes for post processing workflows.

Amazon FSx File Gateway was specifically designed for multiuser interactive file sharing workloads such as group or departmental file shares and home directories. You can also use it for end-user focused applications such as Microsoft Office and Adobe Creative Suite. Since this gateway is backed by a cloud file system, it is more suitable for general-purpose file sharing and is optimized for small and mixed file size workloads. It was developed with a high degree of Windows features parity. These features include support for complex permissions, application-consistent backups, data deduplication, and DFS Namespaces support.

Both gateways provide a local cache, reducing latency and improving application performance, while being centrally managed in the cloud via the AWS Management Console.

Preparing for a successful deployment of Amazon FSx File Gateway

If you would like to get started right away, go ahead and jump over to the Amazon FSx File Gateway user guide. You may also find this demo video helpful in your deployment. I cover a few additional considerations to prepare you for a successful deployment with the rest of this blog.

When looking to deploy an Amazon FSx File Gateway, you must first create an Amazon FSx for Windows File Server file system in your Amazon VPC. This Active Directory validation tool may be helpful when creating your first Amazon FSx file system. After creating an Amazon FSx file system, you will be able to move on to deploying an Amazon FSx File Gateway for local cached access to your Amazon FSx file system data.

Deploying your Amazon FSx File Gateway can be completed in four main steps:

  1. First, create your gateway by downloading a virtual machine appliance or installing a hardware appliance.
  2. Then connect your appliance to your network and join that gateway to your Active Directory domain.
  3. Once your gateway is activated and joined to the domain, you are able to attach up to five Amazon FSx file systems to your gateway.
  4. Finally, your SMB clients will be able to mount Amazon FSx file shares directly from the local gateway.

This diagram illustrates the four main steps to creating an Amazon FSx file gateway and the networking ports required as outlined in this section.

The four main steps to creating an Amazon FSx file gateway and the networking ports required as outlined in this section
Before going through the process of creating your gateway, there are a few things to consider in order to prepare for a successful deployment. These include:

  1. Choosing the Region you will be managing your gateway, which must be the same Region as where you created your Amazon FSx file system.
  2. Planning how your gateway and Amazon FSx file system will communicate with your Active Directory environment in order to allow your domain users to authenticate against the local gateway.
  3. Ensuring you have private networking between the local environment where you are going to deploy your gateway and the cloud environment where you will manage your gateway and configure your Amazon FSx file system. This may include opening firewall ports between locations.

The following is an illustration of the general architecture for an Amazon FSx File Gateway deployment.

General architecture for an Amazon FSx File Gateway deployment

First consideration: management and monitoring

Run through the Storage Gateway wizard within the same Region as your Amazon FSx file system.

Metrics within the AWS Storage Gateway console, file system metrics Cache Hit Percent and Cache Percent Dirty (can also set up alarms)

The virtual machine or hardware appliance running the gateway runs in a remote location, such as your on-premises data center. However, the gateway is centrally managed from the AWS Storage Gateway console in the same Region as Amazon FSx. Your gateway communicates over a control plane via HTTPS over port 443, back to a Storage Gateway endpoint. I recommend creating an Amazon VPC endpoint for Storage Gateway.

Your gateway must communicate with the remote storage gateway endpoint over TCP ports 443, 1026, 1027, 1028, 1031, 2222.

AWS Storage Gateway captures and provides detailed metrics within the AWS Storage Gateway console automatically. It is also recommended to set up File Gateway health logs, which are stored in Amazon CloudWatch. You will also want to review the details about file system metrics. Pay particular attention to Cache Hit Percent, which is the percent of application read operations from the file shares that are served from cache, and Cache Percent Dirty, the file share’s contribution to the overall percentage of the gateway’s cache that has not been uploaded and persisted to AWS. It is recommended to set up CloudWatch alarms as well, to help you monitor and notify you of potential impacts to the service. The following is an example of the metrics available in the AWS Management Console for a particular Amazon FSx File Gateway.

Optionally, you may choose to enable Amazon FSx File Gateway audit logs. Amazon FSx File Gateway audit logs provide you with details about user access to files and folders within a file system association. You can use audit logs to monitor user activities and take action if inappropriate activity patterns are identified. The logs are formatted similar to Windows Server security log events to support compatibility with existing log processing tools for Windows security events. Additionally, you may want to enable file access auditing on your Amazon FSx file system.

Second consideration: Active Directory

Your Amazon FSx File Gateway must be joined to same Active Directory environment as your Amazon FSx file system.

Joining your Amazon FSx file system to your Active Directory environment can sometimes be an extra step within your organization. There may be firewall ports to open or another team to work with who manages your Active Directory environment. They may also potentially be some design decisions on how you are going to connect your on-premises Active Directory environment to the cloud. It is best to get ahead of this, and think about this design from a long-term perspective. Customers choose to accomplish this in a couple of ways:

  1. Use self-managed Active Directory (on-premises or in the cloud)
  2. Use AWS Directory Service to create an AWS Managed Microsoft AD in the cloud, and set up a two-way trust with their on-premises Active Directory environment

Whichever way you choose to deploy within your organization, the requirements are that:

  • Your Amazon FSx File Gateway and your Amazon FSx file system are joined to the same domain.
  • Your SMB client users can authenticate whether they are in that same domain, or in a domain with a two-way trust.
  • Your gateway must communicate with local Active Directory over UDP ports 137, 138, and TCP port 389.
  • Your gateway must communicate with local DNS over TCP/UDP port 53.
  • Your gateway must communicate with local NTP over UDP port 123.

Third consideration: private networking

Amazon FSx File Gateway requires private networking from where the gateway is deployed to where your Amazon VPC and the Amazon FSx file system is created. This is unlike our other gateway types, and is driven by the fact that the SMB protocol is used by the gateway to communicate with the underlying Amazon FSx file system. For security reasons, the SMB protocol is not a protocol that should traverse the public internet. Many customers choose to connect their on-premises environments to the cloud by setting up AWS Direct Connect, AWS Site-to-Site VPN, or by setting up their own private connection solutions.

Your gateway must communicate via the data plane to the Amazon FSx file system over SMB TCP/UDP port 445.

It is likely you will need to work internally with your network team to open the proper firewall ports to ensure proper connectivity, and validate network bandwidth is available.

For help with networking requirements:

Conclusion

In this blog, I covered the general use cases served by Amazon FSx File Gateway, and the key benefits and features of the service. I also briefly explored the differences between Amazon S3 File Gateway and Amazon FSx File Gateway. By eliminating active management of file workloads, Amazon FSx File Gateway can help you accelerate your file-based storage migration to the cloud to enable faster performance, improve data protection, and reduce costs. Additionally, some of the key considerations in this post make it easier for you to be better prepared for a successful deployment of Amazon FSx File Gateway for fast, low-latency access on premises to fully managed file shares.

Here are some additional resources to further help you get started. If you need more assistance, please do not hesitate to reach out to your AWS account teams, and feel free to leave feedback in the comments section.