AWS Storage Blog

Automating cache refresh process for File Gateway on AWS Storage Gateway

For AWS Storage Gateway customers, keeping file shares up to date with changes in Amazon S3 buckets is important to ensure that users are not accessing stale data on their file shares. Previously, customers would either initiate a cache refresh for their file shares manually using an API or by managing a process that does so periodically. This is overhead for customers who just want a hassle-free way of keeping their file shares up to date with their S3 buckets.

Earlier this month, AWS Storage Gateway announced new cache management capabilities for File Gateway, which introduces a new cache refresh process. This new feature enables customers to automatically refresh the metadata cache to stay up to date with changes in their S3 buckets without having to manually invoke a cache refresh or manage a process to do so.

In this post, I discuss the pain points faced by existing customers and the advantages of using the new cache refresh process. In addition, I detail how you can adopt this new capability with your File Gateway today regardless if you’re an existing or a new customer to AWS Storage Gateway.

File Gateway and the RefreshCache API

AWS customers use File Gateway to access virtually unlimited cloud storage through common file protocols such as Server Message Block (SMB) and Network File System (NFS). Their workflows often involve other applications or users writing to the same S3 bucket without writing the files through the gateway’s file shares. The other writer could be another File Gateway or simply the Amazon S3 Command Line Interface (CLI) or the S3 Console. To synchronize the file share with the changes made to the S3 bucket by other writers, the RefreshCache API operation was created. While this meant that customers were given flexibility and control over when and what directories of the file share are to be refreshed, certain pain points persisted.

Previously, many customers would have to manage cron-jobs, scripts, AWS credentials, or AWS Lambda functions to invoke the RefreshCache API. This kept their file share in sync with the changes in their Amazon S3 bucket. Furthermore, even though the API offers the ability to refresh individual directories, it was difficult to determine the exact directories that needed refreshing. Thus, for convenience, many customers simply refreshed the whole file share including directories that are already up-to-date, generating wasted work on the gateway and unnecessary Amazon S3 API invocations. This ultimately impacted the performance of the gateway and also increased S3 API costs, which could be otherwise avoided.

Automated cache refresh

In order to address these challenges and simplify cache management, customers can offload the management of refreshing the cache to the gateway by using the automated cache refresh feature. This feature is based on the ‘duration since last access’ for each directory. All access requests that are made while the timer is still running treat the contents of the directory as current. After the timer expires, the next access of the directory results in a refresh of the directory. If the timer is 30 minutes, then the contents of that directory reflects the contents in Amazon S3 no longer than 30 minutes ago. With this cache refresh approach, customers no longer have to manage cron-jobs, scripts, AWS credentials, or AWS Lambda functions that would otherwise be necessary to invoke the RefreshCache API. This frees up important resources to focus on other key business tasks. Directories are also refreshed as needed. In practice, a directory is refreshed on the next access if it has been long enough since it was last refreshed and the TTL (time to live) has expired. This reduces wasted work on the gateway and unnecessary S3 API invocations.

Configuring automated cache refresh

For new and existing file shares with AWS Storage Gateway, customers can configure the new cache refresh setting using the Storage Gateway Console or the Storage Gateway API. The duration of the timer has a range between 5 minutes and 30 days, and the data is entered in the API in seconds. The value would be honored once the file share transitions to the “AVAILABLE” state.

Console

If you already have an existing file share, simply visit the AWS Storage Gateway console and go to the File shares tab. Then, choose your file share, select Actions, and then Edit share settings.

Configuring automatic cache refresh in the console - go to the correct file share, then actions, and edit file share.

You should then see a window where you can enter the automated cache refresh value. Click Save to apply the change.

You should then see a window where you can enter the automated cache refresh value. Click Save to apply the change.

If you’re creating a new file share, you’re able to add the automated cache refresh value during configuration.

If you’re creating a new file share, you’re able to add the automated cache refresh value during configuration.

API

Customers can add the automated cache refresh attribute to their file share through the following listed AWS Storage Gateway APIs. Just include the field CacheStaleTimeoutInSeconds to the API request (a value of 0 would disable the feature – meaning directories would stop getting auto-refreshed).

Cleaning up

If you have created any resources (including S3 buckets) to test this new capability, remember to delete them to avoid incurring any unwanted charges. For pricing details please refer to AWS Storage Gateway pricing.

Conclusion

Prior to the introduction of the new automated cache refresh capability for File Gateway, AWS Storage Gateway customers had to manually invoke the RefreshCache API or manage a process that invoked the API to ensure their users were not accessing stale data on their file shares. While the RefreshCache API offers customers the flexibility and full control over the cache refresh lifecycle of their file shares, it does require customers to manage an extra process.

In this post, I highlighted that the new automated cache refresh capability for File Gateway is a simple and hassle-free method for customers who want an easy way to keep the metadata cache of file shares up to date with their Amazon S3 buckets. This eliminates the overhead required to keep file shares up to date using the RefreshCache API, so customers can focus their time and effort on more valuable tasks and core competencies.

If you are excited about this new capability and want to learn more about AWS Storage Gateway please visit the AWS Storage Gateway user guide.

AWS is a customer obsessed organization. Please continue to submit feedback to us so we can deliver features and enhancements that are valuable to you. Thank you for reading this blog post, please don’t hesitate to leave any questions or thoughts you may have in the comment section below.