AWS Storage Blog

Automate Amazon S3 File Gateway on Amazon EC2 with Terraform by HashiCorp

Infrastructure as Code (IaC) involves managing IT infrastructure through code and automation tools to reduce manual management prone to errors, slow scaling, and overhead. For organizations implementing a hybrid cloud infrastructure, automation can ensure uniformity, scalability, and cost reduction while getting cloud resources provisioned efficiently. Automated provisioning and configuration enable organizations to adapt, innovate, and stay competitive, promoting consistency and agility in response to market dynamics.

A number of tools and services are available to customers to automate hybrid infrastructure through AWS Storage Gateway, a set of hybrid cloud storage services that provide on-premises access to virtually unlimited cloud storage. AWS CloudFormation is a service that enables users to automate the deployment of resources on AWS. However, many customers have centered their IaC practice on Terraform by Hashicorp to enable a consistent methodology for managing the infrastructure lifecycle of both in-cloud AWS resources and on-premises virtual infrastructure, thereby lowering operational overhead and providing better governance.

In our previous blog, Automate Amazon S3 File Gateway deployments in VMware with Terraform, we walked through using IaC to deploy AWS Storage Gateway using Terraform Cloud, HashiCorp’s managed service offering. In this blog post, we will guide you through the process of  provisioning an Amazon Elastic Compute Cloud (Amazon EC2) based Storage Gateway using IaC. We’ll achieve this by utilizing the AWS Storage Gateway module in combination with Terraform open source binary. You can build on the steps provided here to further automate deployment of Storage Gateway using a continuous integration and continuous delivery (CICD) pipeline.

AWS Storage Gateway overview

AWS Storage Gateway is a hybrid cloud storage service that gives your applications on-premises and in-cloud access to virtually unlimited cloud storage. You can deploy Storage Gateway as a virtual machine (VM) within your VMware, Hyper-V, or Linux KVM virtual environment, as an Amazon EC2 instance within your Amazon Virtual Private Cloud (Amazon VPC), or as a pre-configured physical hardware appliance.

The S3 File gateway offers Server Message Block (SMB) or Network File System (NFS) based access to data in Amazon S3 with local caching for customers that have existing applications, tools or process that leverage a file interface. It’s used for on-premises applications and for Amazon EC2 resident applications that need file storage in S3 for object-based workloads.

Customers deploy an S3 File Gateway on Amazon EC2 for the following reasons:

  • For copying backups or dumps of databases running on EC2 such as Microsoft SQL Server, Oracle, or SAP ASE.
  • In data pipelines use cases in health-care and life sciences, media and entertainment, and other industries to move data from devices to Amazon S3.
  • For archiving use cases where you can tier your file data to lower cost storage with Amazon S3 Lifecycle Policies.

Solution overview

We leverage the Terraform AWS Storage Gateway module to provision an EC2 based Storage Gateway on AWS. We provide end to end examples for creating a Storage Gateway virtual machine in a VPC, including activation, creation of an Amazon S3 bucket and creation of NFS file shares.

terraformAWS SGW module to provision an EC2 based gateway on AWS

The Terraform AWS Storage Gateway module contains the S3 File gateway examples for both SMB and NFS deployments on EC2 and VMware. The module will create a number of networking, IAM and security resources that you can use in your deployment.

Solution walkthrough

Using the following steps, we will create an Amazon S3 File Gateway on an EC2 instance that will provide an NFS interface to seamlessly store and access files as objects in Amazon S3.

  1. Clone the module repository.
  2. Setup values for the Terraform variables.
  3. Trigger the deployment.
  4. Start using the file shares.

Prerequisites

Step 1. Clone the repository

Clone the repository using the git clone command as shown in the following example:

git clone https://github.com/aws-ia/terraform-aws-storagegateway

The following is the directory structure for this repo.

terraform-aws-storagegateway/
- modules/ # AWS Storage Gateway Modules
  - aws-sgw      # AWS Storage gateway module for activation and management
  - ec2-sgw      # EC2 Storage gateway 
  - s3-nfs-share # S3 File Gateway NFS
  - s3-smb-share # S3 File Gateway SMB
  - vmware-sgw   # VMware Storage gateway
- examples/ # Full examples
  - s3-nfs-filegateway-ec2    # S3 File Gateway on EC2 - NFS
  - s3-smb-filegateway-ec2    # S3 File Gateway on EC2 - SMB
  - s3-nfs-filegateway-vmware # S3 File Gateway on VMware - NFS
  - s3filegateway-vmware      # S3 File Gateway on VMware - SMB

To provision our File Gateway with Terraform, you’ll need at least two modules from the modules/  sub-directory: aws-sgw/ and ec-sgw/. Depending on your use case, you’ll choose either the ‘s3-nfs-share’ module for Linux clients using NFS versions 3 and 4.1 or the ‘s3-smb-share’ module for Windows clients using the SMB protocol.

For the rest of this walkthrough, we will assume an NFS use case. We call these three modules from the main.tf file in the examples/s3-nfs-filegateway-ec2 directory. Therefore, this main.tf file will be our root module. Addcd into the preceding directory using the command as follows:

cd examples/s3-nfs-filegateway-ec2

Step 2. Set up values for the Terraform variables

First, we will assign appropriate values to the Terraform variables required by each module. The README.md file for each module provides a description of all required and optional Terraform variables.

Calling the EC2 based Storage Gateway module

The following code snippets from the main.tf file shows the child module blocks and example input variables.

module "ec2_sgw" {
  source               = "aws-ia/storagegateway/aws//modules/ec2-sgw"
  
  // Adjust any variable values below
  vpc_id               = "vpc-abcdef123456"
  subnet_id            = "subnet-abcdef123456"
  name                 = "my-storage-gateway"
  availability_zone    = data.aws_availability_zones.available.names[0]
  aws_region           = var.aws_region

  //Add any other variables here
}

Note: the example main.tf file creates a new VPC and Subnet to deploy this file gateway. If you have an existing VPC, you can pass in corresponding VPC and Subnet IDs by modifying the vpc_id and subnet_id variable assignments for the preceding ec2-sgw module.

To administer the Storage Gateway EC2 instance, connect directly using Secure Shell (SSH), or use Session Manager, a capability of AWS Systems Manager. For Session Manager, review the documentation for Setting up Session Manager and Working with Session Manager. To connect over SSH, create an Amazon EC2 key pair and set the ssh_key_name variable as the key pair name.

For an example of creating the Amazon EC2 key pair from an existing public key and setting the ssh_key_name variable, then review examples/s3-nfs-filegateway-ec2/main.tf. Here, we set an existing public key path for the ssh_public_key_path variable. To create a public key for Amazon EC2, follow this procedure using ssh-keygen. Finally, ensure that the Security Group attached to Storage Gateway EC2 instance allows SSH traffic.

Calling the AWS File Gateway module

The following code snippet calls the AWS Storage Gateway module for activation once the gateway VM is created.

module "sgw" {
  depends_on         = [module.ec2_sgw]
  source             = "aws-ia/storagegateway/aws//modules/aws-sgw"
  gateway_name       = "my-storage-gateway"
  gateway_ip_address = module.ec2_sgw.public_ip
  join_smb_domain    = false
  gateway_type       = "FILE_S3"
}

If you want the Storage Gateway to join an Active Directory (AD) server, then specify  join_smb_domain = true and also set the input variables  domain_controllers,  domain_name, domain_password  and   domain_username. See the module README.md Inputs for a description of these variables.

Calling the S3 File Gateway NFS module

You can use the following code snippet after activating your storage gateway to automate the creation of NFS shares.

module "nfs_share" {
  source        = "aws-ia/storagegateway/aws//modules/s3-nfs-share"
  share_name    = "nfs_share_name"
  gateway_arn   = module.sgw.storage_gateway.arn
  bucket_arn    = "s3bucketname:arn"
  role_arn      = "iamrole:arn"
  log_group_arn = "log-group-arn"
  client_list   = ["10.0.0.0/24","10.0.1.0/24"]
}

Note:  client_list is a required variable that restricts which source CIDR blocks can connect to the NFS endpoint provided by AWS Storage gateway. Also note that there are prerequisites required such as the S3 bucket, IAM role and the log group to be created before using this sub-module.

After setting the appropriate module input variables, we need to assign values for any Terraform variables in the root module without a default value. Tfvars files are a common and simple way to assign variables in Terraform. We have provided an example terraform.auto.tfvars.example file. Rename this to terraform.auto.tfvars the adjust the variables values using a text editor.

mv terraform.auto.tfvars.example terraform.auto.tfvars

The variables you set in the terraform.auto.tfvars file will be passed into the module.

Step 3. Trigger the deployment

Before you can trigger a deployment, configure the AWS CLI credentials for Terraform using the service account that was created as part of the prerequisites.

  1. Run the command terraform init to download the modules and initialize the directory.
  2. Run terraform plan and examine the outputs.
  3. Run terraform apply and allow the apply to complete.

If the Terraform apply is successful, the output will appear as the following.

successful terraform init

To view and examine the resources created by terraform, you can use the commands terraform state list and terraform state show commands.

Step 4. Use the file share

1. Navigate to EC2 console and verify the newly created gateway. Note the IP address of the EC2 instance.

Private IP addess of the EC2 instance

2.Once in the AWS Management Console, navigate to AWS Storage Gateway.

AWS Console to Storage Gateway

3. Select the newly created Storage Gateway and compare the preceding IP address which maps to the Storage Gateway virtual machine deployed on EC2.

Mapping to the Storage Gateway VM

4. Navigate to the File shares from the menu or the left by directly selecting file share under storage resources to find the newly created file share. Copy the net use copy command to mount the file share.

File share to find newly created files

5. Mount the NFS file share on your client. For more information, check out the using nfs file share documentation.

Mounted NFS file on client

6. Your NFS file share backed by S3 File Gateway is now ready to use.

Ready to use NFS fileshare

Additional considerations

This section describes additional considerations as you use the Terraform module, including steps to toggle on or off the creation of security groups and active directory domain configuration.

Network considerations

Terraform calls AWS APIs to manage the lifecycle of resources on AWS. Therefore, outbound internet connectivity to AWS API endpoints from the server or tool running Terraform will be needed to ensure Terraform can operate properly. For more information regarding the network requirements, refer to this page.

Activation workflow

A request to Storage Gateway traverses two network paths. Activation requests sent by a client connect to the gateway’s virtual machine (VM) over port 80 (HTTP) which in this case is deployed on Amazon EC2. If the gateway successfully receives the activation request, then the gateway communicates with the Storage Gateway endpoints to receive an activation key over port 443 (HTTPS) and completes the Security Gateway activation.

gateway communicating with SGW endpoints

Storage Gateway does not require port 80 to be publicly accessible. The required level of access to port 80 depends on your network configuration. If you activate your gateway from a client virtual machine from which you connect/run the terraform scripts, that client must have access to your storage gateway’s port on 80. Once the gateway is activated, you may remove the rule that allows port 80 access from your client machine to the S3 File Gateway VM.

Storage Gateway VPC endpoint configuration

The latest version of the Terraform Storage Gateway allows you to create an Interface VPC Endpoint for Storage Gateway. A VPC Endpoint allows a private connection between the EC2 or VMware virtual appliance and the AWS storage gateway service. You can use this connection to activate your gateway and configure it to transfer data to AWS storage services without communicating over the public internet.

To create a VPC Endpoint using the module, set the variable create_vpc_endpoint=true and supply the VPC ID, VPC endpoint subnets, and the private IP address of the EC2 Gateway as Terraform variables. The following snippet from examples/s3-nfs-filegateway-ec2/main.tf shows VPC endpoint related configuration when calling the module.

VPC endpoint configuration calling the module

A security group is also needed for the VPC Endpoint. In the preceding example, the module handles the creation of the security group. However, you may use the vpc_endpoint_security_group_id variable to associate an existing security group with the VPC endpoint. See this documentation which shows the Security Group requirements for Storage Gateway VPC endpoint. In this module, the security groups are already pre-configured with the required rules with the private IP address of the storage gateway virtual machine. You can find the configuration in the file modules/aws-sgw/sg.tf.

S3 VPC endpoint configuration

We recommend you create a separate VPC endpoint for Amazon S3 File Gateway to transfer data through the VPC rather than a NAT Gateway or NAT Instances. This allows for optimized and private routing to S3 with lower costs. In the examples/s3-nfs-filegateway-ec2/main.tf, we have created a Gateway VPC endpoint as shown in the following example:

resource "aws_vpc_endpoint" "s3" {
  vpc_id          = module.vpc.vpc_id
  service_name    = "com.amazonaws.${var.aws_region}.s3"
  route_table_ids = module.vpc.private_route_table_ids
} 

Security group configuration

The Terraform Storage Gateway module provides the ability to create the security group and the required rules for your gateway to communicate with the client machines and the storage gateway endpoints. You can achieve this by setting the variable to create_security_group = true. You can also limit access to a range of ingress CIDR blocks in your network from where you require access to the storage gateway. You can do this by modifying the ingress_cidr_blocks attributes.

The module also includes the ingress_cidr_block_activation variable specifically to limit access to the CIDR block of the client machine that activates the storage gateway on port 80. You can remove this security group rule once the gateway is activated. You can find the source code of the security group configuration in modules/ec2-sgw/sg.tf file.

module "ec2_sgw" {
  source                        = "aws-ia/storagegateway/aws//modules/ec2-sgw"
  vpc_id                        = var.vpc_id
  subnet_id                     = var.subnet_id
  ingress_cidr_block_activation = "10.0.0.1/32"
  ingress_cidr_blocks           = ["172.16.0.0/24", "172.16.10.0/24"]
  create_security_group         = true
}

You can toggle off the create_security_group variableby setting it to false if you use an already existing security group associated with your EC2 based storage gateway or if you would like to create the security group outside of the EC2 Storage Gateway module deployment. You may then specify your own security group ID by appending the security_group_id attribute as shown in the following example:

module "ec2_sgw" {
  source                = "aws-ia/storagegateway/aws//modules/ec2-sgw"
  vpc_id                = var.vpc_id
  subnet_id             = var.subnet_id
  create_security_group = false 
  security_group_id     = "sg-12345678"
}

DevOps best practices

To scale your EC2 File Gateway module usage across your organization, consider these steps:

  1. Store infrastructure templates in a code repository and set up automated pipelines for testing and deployment.
  2. Automate Terraform Infrastructure as Code (IaC) workflows using tools like Terraform Cloud or AWS Developer Tools for collaborative, scalable, and governed IaC.
  3. Encourage module reuse by leveraging the EC2 File Gateway module and storing it in the Terraform Cloud Module Registry or a Git repository like AWS CodeCommit.
  4. Protect your Terraform state file integrity by using a backend like S3 and enable collaborative IaC with state file locking.

For additional resources, consider additional CI/CD best practices from AWS and Terraform considerations from Hashicorp.

Gateway sizing and performance

For a small gateway deployment hosting one to ten shares per gateway, use an m5.xlarge EC2 instance with four vCPUs and sixteen GiB of RAM as the default configuration in the Terraform EC2 Storage Gateway module. For higher performance to support more users and workloads in a medium or large deployment, consider m5.2xlarge or m5.4xlarge EC2 instances. You can find details on file shares and performance recommendations here.

The cache storage requirement ranges from 150 GB to 64 TB, typically sized for the hot data set, following the 80/20 rule. You can adjust instance type and cache size by modifying the instance_type  andcache_block_device : disk_sizes attributes in the ‘ec2-sgw’ module in either the provided examples or your custom Terraform ‘main.tf’ file.”

As an example:

module "ec2_sgw" {
  source                = "aws-ia/storagegateway/aws//modules/ec2-sgw"
  vpc_id                = var.vpc_id
  subnet_id             = var.subnet_id
  ingress_cidr_blocks   = var.ingress_cidr_blocks
  create_security_group = true 
  instance_type         = "m5.2xlarge"
  
  cache_block_device = {
    disk_size   = 150
  }
}

For more information on the Storage Gateway vCPU and RAM sizing and requirements, consult this documentation page. To learn more about cache sizing, refer to this documentation.

When transferring large amounts of data to the Storage Gateway, deploy the File Gateway EC2 instance in the same Availability Zone as your client or your SQL Server EC2 instances to minimize cross-Availability Zone network charges. You can adjust the availability_zone variable to match the desired zone during gateway creation.

module "ec2_sgw" {
  source                = "aws-ia/storagegateway/aws//modules/ec2-sgw"
  vpc_id                = var.vpc_id
  subnet_id             = var.subnet_id
  ingress_cidr_blocks   = var.ingress_cidr_blocks
  create_security_group = true 
  availability_zone     = "us-east-1a"
}

Data Encryption using KMS

By default, Storage Gateway uses Amazon S3-Managed encryption keys (SSE-S3) to server-side encrypt all data it stores in Amazon S3. You have an option to use the Storage Gateway API to configure your gateway to encrypt data stored in the cloud using server-side encryption with AWS Key Management Service (SSE-KMS) keys. For more information, refer to this link.

To encrypt the root and cache disk EBS volumes, append the cache_block_device and root_block_device to the EC2-SGW and supply the KMS key arn to the kms_key_arn as shown in the following example:

module "ec2_sgw" {
  source                = "aws-ia/storagegateway/aws//modules/ec2-sgw"
  vpc_id                = var.vpc_id
  subnet_id             = var.subnet_id
  ingress_cidr_blocks   = var.ingress_cidr_blocks
  create_security_group = true 
  availability_zone     = "us-east-1a"
  ssh_public_key_path   = var.ssh_public_key_path

  # Cache and Root Volume encryption key
  cache_block_device = {
    kms_key_id = "arn:aws:kms:us-west-2:111122223333:key/1234abcd"
  }

  root_block_device = {
    kms_key_id = "arn:aws:kms:us-west-2:111122223333:key/1234abcd"
  }
}

The s3-nfs-share and s3-smb-share sub modules allow you to add KMS encryption for your file shares. To encrypt a file share, add the attribute kms_encrypted=true and supply the kms_key_arn to the submodule as shown in the following example:

module "nfs_share" {
  source        = "aws-ia/storagegateway/aws//modules/s3-nfs-share"
  share_name    = "nfs_share_name"
  gateway_arn   = module.sgw.storage_gateway.arn
  bucket_arn    = "s3bucketname:arn"
  role_arn      = "iamrole:arn"
  log_group_arn = "log-group-arn"
  client_list   = ["10.0.0.0/24","10.0.1.0/24"]
  kms_encrypted = true
  kms_key_arn   = "arn:aws:kms:us-west-2:111122223333:key/1234abcd"
}

Credentials management

Refer to this documentation by HashiCorp on setting AWS credentials for Terraform. We recommend setting AWS credentials using environment variables, or using a named profile to keep them out of the repository and terraform state file. When possible, using temporary security credentials in Identity and Access Management (IAM) Role is preferred.

Cleanup

To delete all resources associated with this example, configure AWS CLI credentials as done in Step 3 of this post, and change to the examples/s3-nfs-filegateway-ec2 directory. Run the terraform destroy command to delete all resources terraform previously created. Note that any resources created outside of terraform will need to be manually deleted.

Conclusion

In this blog post we discussed how to provision an EC2 based Storage Gateway using Terraform by Hashicorp. We outlined steps to deploy an AWS Storage Gateway on Amazon EC2, activate your AWS Storage Gateway within AWS, create an Amazon S3 bucket, and create an NFS file share that is ready to be mounted by a client. The use of Infrastructure as Code increases consistency in deployments, speeds up deployment times, increases operational efficiency and therefore accelerating migrations to the cloud. Here, you can find the Storage Gateway Terraform Module in the Terraform Registry. You can customize the module to suit your organization’s needs and assist you in scaling your gateway deployments.

For more information and to learn more about AWS Storage Gateway, see the following:

Prabir Sekhri

Prabir Sekhri

Prabir Sekhri is a Senior Solutions Architect at AWS in the enterprise financial services sector. During his career, he has focused on digital transformation projects within large companies in industries as diverse as finance, multimedia, telecommunications as well as the energy and gas sectors. His background includes DevOps, security, and designing & architecting enterprise storage solutions such as those from Dell EMC, NetApp, and IBM. Besides technology, Prabir has always been passionate about playing music. He leads a jazz ensemble in Montreal as a pianist, composer and arranger.

Kawsar Kamal

Kawsar Kamal

Kawsar Kamal is a Senior Solutions Architect at AWS in the ISV services sector. Kawsar has been in the infrastructure automation and security space for 15+ years. During his career, he has focused on cloud migration, Infrastructure-as-Code and DevSecOps transformation projects across various industries including Software, global financial, healthcare and telecommunications. In his free time Kawsar enjoys running and hiking.

Harshi Surampalli

Harshi Surampalli

Harshi Surampalli is a Cloud Support Engineer with AWS, based out of Virginia. She focuses on supporting customers in using AWS Storage technologies, particularly AWS Storage Gateway, AWS DataSync, and Amazon Elastic File System. Harshi loves gardening and spending time with family whenever she can.