Managing Amazon ElastiCache with Terraform

Nic Jackson is Developer Advocate at HashiCorp.

Developers continue to pick Redis as their favorite NoSQL data store (see the Stack Overflow Developer Survey 2017). Amazon ElastiCache provides easy, fast, and highly available Redis on AWS. ElastiCache for Redis can be deployed via the AWS Management Console, AWS SDK, Amazon ElastiCache API, AWS CloudFormation, and through deployment tools like HashiCorp Terraform. In this post, we show how to easily deploy Amazon ElastiCache for Redis with Terraform.

Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud. It’s often used to improve application performance by reading from a fast in-memory data store instead of a slower disk-based database. Currently, ElastiCache supports two different engines:

The source code that accompanies this post is available in a GitHub repository. We reference the source files in the repository throughout the post. So before reading on, you might want to clone it now, so you have it handy.

What is Terraform?
Terraform is a tool for managing infrastructure. It uses the principles of infrastructure as code. With infrastructure-as-code tools such as Terraform and CloudFormation templates, your entire infrastructure setup is modeled in a declarative language. Terraform enables this declarative language through the HCL language and the stanzas that make up the many resources that are configurable with cloud infrastructure. Defining your infrastructure in this way gives you a predictable and consistent method of re-creating any of the components in your application infrastructure. It has the added benefit that infrastructure defined as code can be versioned and shared easily with colleagues.

How Terraform works
Terraform is broken down into three main components:

Providers
Data sources
Resources

A provider defines resources and data for a particular infrastructure, such as AWS. The resources allow you to create and destroy infrastructure services like Amazon EC2 instances, virtual private clouds (VPC), and in the case of our example, ElastiCache clusters. Data elements allow you to query the state of existing resources. They enable you to perform tasks, such as retrieve the Availability Zones for a given AWS Region or return the details of an existing server or infrastructure component.

Terraform configuration
Let’s look at how these components work together in a Terraform configuration. We will also see how we can use variables to make our configuration dynamic, and how we can describe dependencies between our resources.

Providers
A provider is responsible for understanding the API interactions and exposing the resources for the chosen platform. The first section we are going to look at is the provider configuration for AWS. This provider allows you to configure Terraform with your credentials and set the AWS Region.

provider “aws” {
    access_key = “XXXXXXXXXXX”
    secret_key = “XXXXXXXXXXX”
    region = “us-west-1”
}

Blocks in Terraform typically follow the previous pattern. HCL is not JSON; however, it is interoperable with JSON. The design of HCL is to find that balance between machine and human readable format. In this particular provider, we can configure a number of different arguments. However, as a bare minimum, we must set up the access_key, secret_key, and region.

access_key: The AWS access key. This is a required argument, but it can also be provided by setting the AWS_ACCESS_KEY_ID environment variable.
secret_key: The AWS secret key. This is an essential argument, but it can also be provided by setting the AWS_SECRET_ACCESS_KEY environment variable.
region: The AWS Region. This is a required argument, but it can also be provided by setting the AWS_DEFAULT_REGION environment variable.

All of the required variables can be replaced with environment variables too. Protecting your secrets is very important. We recommend that you spend five minutes and install git-secrets from AWSLabs, which can help protect against mistakes.

When we use environment variables, we can also securely inject these into our continuous integration (CI) service. Looking at the provider block provider.tf, we can see that it doesn’t contain any of the previously mentioned settings. When we run Terraform, we set the environment variables that correspond to these attributes. We can also choose to pass them as flags to the Terraform command.

Variables
In addition to the environment variables that we can use in our provider, Terraform allows us to explicitly declare variables, which we can use to make our config dynamic. If you look at the file variables.tf, you can see that we are declaring six variables that are used in our config.

The syntax for a variable is as follows:

variable “[variable name]” {
    default     = “[optional]”
    description = “[optional]”
  type        = “string|int|list|map”

default: Allows you to specify a default value for a variable.
description: Allows you to add a human-readable description that describes the purpose of the variable.
type: Defines the type of the variable.

Assigning variables
There are multiple ways to assign variables: using command-line flags, from a file such as the terraform.tfvars file in our example repository, and using environment variables.

Using command-line flags
Variables can be set from the command line by using the -var flag. When we run Terraform, we can set a variable using the following syntax:

$ terraform plan -var ‘myvariable=myvalue’

From a file
In our example repository, we are defining our variables inside the terraform.tfvars file. If you take a look at this file, you see the following:

namespace = “elasticache-tutorial”

The format of this is simply [key] = [value]. Terraform loads all files that match terraform.tfvars or *.auto.tfvars present in the current directory.

From environment variables
Lastly, we can also use environment variables in the form of TF_VAR_name. In our example variables file, we have the variable namespace. We could set this using an environment variable in the form TF_VAR_namespace=myvalue.

To learn more about variables and other Terraform features, see Input Variables in the Terraform documentation.

Data sources
Data sources allow data to be fetched or computed for use elsewhere in a Terraform configuration. If you look at the file network.tf, you can see how we are using the data source aws_availability_zones to retrieve the Availability Zones for the Region that we defined in the provider:

data “aws_availability_zones” “available” {}

The syntax for this configuration is similar to the other elements that we already looked at:

data “[TYPE]” “[NAME]” {}

The data source aws_availability_zones returns the Availability Zones for our configured Region as an array. Let’s look at how we are using this data source inside the aws_subnet resource. It’s also a great segue into introducing resources in general.

Resources
Resources are a component of your infrastructure. It might be some low-level component, such as a physical server, virtual machine, or container. Or it can be a higher-level component, such as an email provider, DNS record, or database provider.

If we look at the file network.tf from the example code, we can see that we are defining a resource for a subnet:

resource “aws_subnet” “default” {
    count                   = “${length(var.cidr_blocks)}”
    vpc_id                  = “${aws_vpc.default.id}”
    availability_zone       = “${data.aws_availability_zones.available.names[count.index]}”
    cidr_block              = “${var.cidr_blocks[count.index]}”
    map_public_ip_on_launch = true

    tags {
        “Name” = “${var.namespace}”
    }
}

Again we are following this syntax:

resource “[TYPE]” “[NAME]” {
    [ATTRIBUTE] = “[VALUE]”
}

Most of the parameters are specific to the resource, but count is a unique parameter that allows you to create n identical resources. When we use the count parameter, we can also access the count variable ${count.index}. This allows us to access a particular element from an array of items.

In the count parameter for our subnet, we have the value of ${length(var.cidr_blocks)}. The variable cidr_blocks is an array that has the value [“10.1.1.0/24”, “10.1.2.0/24”].

The length function is an interpolation syntax function. It obtains the length of the array and creates two resources. Let’s look at interpolation syntax in greater depth.

Interpolation syntax
Terraform allows you to interpolate values within the parameter values for your configuration. These interpolations are wrapped in ${}, such as ${var.namespace} from our subnet resource. In addition to simple variables, we can also reference the outputs of other resources and call functions.

Let’s take a closer look at some of the interpolations in our subnet resource.

Variables
The simplest interpolation is a variable replacement. In our tags attribute for the subnet, we set the value of “Name” to be ${var.namespace}. When Terraform executes, it interpolates the value inside the ${} block and replaces it with the value of the namespace variable.

Resources
In addition to the variables that we have configured, we can also use the output of other resources, such as ${aws_vpc.default.id}, which we are using as a value for the vpc_id parameter. In addition to interpolating the value, Terraform also ensures that the VPC resource is created before the subnet. Terraform uses these references to build up a dependency graph of all your resources. This way it can understand the order in which resources must be created, and it can also parallelize requests to make the creation of your infrastructure quicker.

Functions
Terraform ships with a number of built-in functions, such as the length function that we use in our count parameter. Functions are called with the syntax name(arg, arg2, ...). The function length that we use returns the number of members in a given list or map, or the number of characters in a given string.

For a full overview of interpolation syntax and the available functions in Terraform, see Interpolation Syntax in the Terraform documentation.

Now that we understand how Terraform works and how it connects our resources, let’s walk through the rest of our configuration.

Creating the cluster
To create our cluster, we need to create the following AWS resources:

1 x VPC
1 x Internet Gateway
1 x AWS Route
n x Subnet (where n is the number of Availability Zones)
1 x Security Group
1 x ElastiCache Subnet Group
1 x ElastiCache Replication Group (6 nodes)
1 x Instance (SSH Bastion Host)
1 x Key Pair (for SSH Bastion Host)

This is quite a number of different resources, and they all have interlinking dependencies. Terraform is graph-based, and it uses its interpolation syntax that allows you to define links or dependencies between the various resources. When you run it, the resources are mapped into a graph. Terraform uses this graph to understand which resources it needs to create in a particular order, and which resources it can create in parallel to speed up the process.

The aws_elasticache_subnet_group resource stanza is used to create the cluster subnet group. This resource is different from the aws_subnet stanza, which needs to be created separately or used with an existing subnet.

resource “aws_elasticache_subnet_group” “default” {
  name       = “${var.namespace}-cache-subnet”
  subnet_ids = [“${aws_subnet.default.*.id}”]
}

The name attribute is the name of the subnet group. Using the namespace variable helps to ensure that this value is unique. The subnet_ids is a list of VPC subnet ids for the cache subnet group. For more information, see aws_elasticache_subnet_group in the Terraform documentation.

We then create the aws_elasticache_cluster resource stanza, which creates our cluster:

resource “aws_elasticache_replication_group” “default” {
  replication_group_id          = “${var.cluster_id}”
  replication_group_description = “Redis cluster for Hashicorp ElastiCache example”
 
  node_type            = “cache.m4.large”
  port                 = 6379
  parameter_group_name = “default.redis3.2.cluster.on”
 
  snapshot_retention_limit = 5
  snapshot_window          = “00:00-05:00”
 
  subnet_group_name = “${aws_elasticache_subnet_group.default.name}”
 
  automatic_failover_enabled = true
 
  cluster_mode {
    replicas_per_node_group = 1
    num_node_groups         = “${var.node_groups}”
  }
}

replication_group_id: A required attribute that is the unique identifier for the cluster.
replication_group_description: A required attribute that is a user-created description for the group.
node_type: The type of node to create in the node group. For information about available node types, see Choosing Your Redis Node Size in the Amazon ElastiCache User Guide.
port: The port number through which each node will accept connections. We are using the default Redis port 6379.
parameter_group_name: The name of the parameter group that defines the runtime properties of your nodes and clusters. For details about the default parameter groups, see ElastiCache Parameter Groups in the AWS documentation. You can also configure a custom parameter group using the aws_elasticache_parameter_group For more information, see aws_elasticache_parameter_group in the Terraform documentation.
snapshot_retention_limit: Allows us to configure a daily backup of the cluster state. We are setting the retention period to five days for this backup. To disable backups, we can either omit this attribute from the config, or set the value to 0.
snapshot_window: The time range (in UTC) during which ElastiCache begins taking a daily snapshot of your cluster.
aws_elasticache_subnet_group: The name of the subnet group. We are referencing the output from the aws_elasticache_subnet_group resource that we created earlier.
automatic_failover_enabled: A parameter that defines whether the replica nodes are automatically promoted to primary when the existing primary node fails.
replicas_per_node_group: The number of replica nodes in each node group. Replica nodes are distributed across the Availability Zones for redundancy.
num_node_groups: The number of shards for the Redis replication group. Changing this variable forces a re-creation of the cluster.

In this example, snapshots are enabled with a five-day retention period. The ElastiCache console shows a list of these backups.

Running Terraform
If you’re not already in the folder, change to the folder where you checked out the .

Before running terraform plan and terraform apply, set a few environment variables with your AWS account details. For more information about using Terraform with AWS, take a look at the post Terraform: Beyond the Basics with AWS on the AWS Partner Network (APN) Blog.

export AWS_ACCESS_KEY_ID=[AWS ACCESS KEY ID]
export AWS_SECRET_ACCESS_KEY=[AWS SECRET ACCESS KEY]
export AWS_REGION=[AWS REGION, e.g. eu-west-1]

Run terraform plan and terraform apply in your terminal.

$ terraform plan
# ...
Plan: 10 to add, 0 to change, 0 to destroy.

You should see output similar to the following in your terminal output:

$ terraform apply
# ...
Apply complete! Resources: 10 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
 
State path:
 
Outputs:
 
configuration_endpoint_address = tfrediscluster.ua5mrp.clustercfg.euw1.cache.amazonaws.com
ssh_host = 52.30.43.172

If we also look at the AWS console, we can see that the nodes have been created and are ready for use.

To test the cluster, use SSH to connect to the instance that’s listed in the Terraform output, with the user name ubuntu:

ssh ubuntu@52.30.43.172

To connect to an ElastiCache cluster, use the configuration endpoint that’s provided by AWS in the Terraform output. This returns a list of active nodes. We can use these nodes to interact with the cluster. To see this in operation, run the following command in your SSH session, replacing the parameter value for -h with your cluster’s configuration endpoint. Then execute the CLUSTER NODES command to show the cluster details. A full list of commands is available in the Redis documentation.

$ redis-cli -h tfrediscluster.ua5mrp.clustercfg.euw1.cache.amazonaws.com -p 6379
$ tfrediscluster.ua5mrp.clustercfg.euw1.cache.amazonaws.com:6379> CLUSTER NODES

You should see output similar to the following:

2d5db9ee5ac9dc34c1798ee1122b48e9094a71ea 10.1.1.132:6379 master - 0 1495477718562 1 connected 0-5461
e8da4bf07ed69d44fe5a2c648148e049705838a1 10.1.2.147:6379 master - 0 1495477719570 0 connected 5462-10922
cabaa3e60d7ac0b4ab861d25d721bb579c58005c 10.1.1.53:6379 master - 0 1495477716549 2 connected 10923-16383
95a5796c4bb46a2fc8c9203d42dcbd0abc15dc2f 10.1.1.105:6379 myself,slave e8da4bf07ed69d44fe5a2c648148e049705838a1 0 0 1 connected
8d7b98815198e3eb861abf4538a1c367edcc012d 10.1.2.103:6379 slave cabaa3e60d7ac0b4ab861d25d721bb579c58005c 0 1495477717556 2 connected
9f4fc8c777be63df257423267fad86894b5e9e2d 10.1.2.166:6379 slave 2d5db9ee5ac9dc34c1798ee1122b48e9094a71ea 0 1495477715542 1 connected

AWS launches the nodes into multiple Availability Zones, ensuring that the primary nodes and the replica nodes are always in a different zone. This way if a zone is lost, the cluster can failover, promoting the replicas in the different zone to primary nodes.

We can now connect to one of the nodes and execute commands. Most of the client libraries take a list of addresses and automatically manage load balancing. For this demonstration, we can just select the first primary node in the list.

ubuntu@ip-10-1-1-93:~$ redis-cli -c -h 10.1.1.132 -p 6379
10.1.1.132:6379> set foo bar
-> Redirected to slot [12182] located at 10.1.1.53:6379
OK
10.1.1.53:6379> get bar
-> Redirected to slot [5061] located at 10.1.1.132:6379
(nil)
10.1.1.132:6379> get foo
-> Redirected to slot [12182] located at 10.1.1.53:6379
“bar”

Looking at the output, you can see that when we write the value to the key foo, we are redirected to a different server. The Redis cluster shards the keys across the nodes. When you connect to a server and read or write a key, the node forwards the request to the correct node if it does not contain the required key.

Destroying the cluster
You would almost never need to destroy your cluster in production. However, a running cluster incurs costs, and if you’re testing this configuration and not creating a production cluster, don’t forget to destroy it! Destroy the cluster by running terraform destroy in the terminal.

$ terraform destroy
# ...
Destroy complete! Resources: 13 destroyed.

Summary
In this post, we provided a brief introduction to the power of Terraform and how you can use it to manage Amazon ElastiCache. The AWS provider for Terraform not only enables you to manage ElastiCache resources, it also lets you model your complete application infrastructure as code.

For more information about all the features, see the Terraform ElastiCache documentation. For more information in general about Terraform and the AWS provider, take a look at the Terraform documentation.

AWS Database Blog

Managing Amazon ElastiCache with Terraform

Resources

Blog Topics

Follow