AWS Database Blog

Building secure Amazon ElastiCache for Valkey deployments with Terraform

Infrastructure as Code (IaC) provides a systematic approach to managing deployments, enabling version control, peer reviews, and automated deployments. For ElastiCache for Valkey, IaC ensures critical security configurations like encryption keys, network isolation, and authentication tokens are consistently applied across development, staging, and production environments. This approach transforms the infrastructure provisioning process into a reliable, repeatable pattern that scales with your organization’s needs.

In this post we show you how to build a secure Amazon ElastiCache for Valkey cluster using Terraform, implementing best practices and comprehensive security controls including encryption, authentication, and network isolation.

Solution overview

Amazon ElastiCache for Valkey is a fully managed caching service delivering microsecond latency that offers a serverless option to enable automatic scaling and pay-per-use pricing with setup times under five minutes, and a traditional node-based option that provides more granular control over the infrastructure. The serverless option eliminates the complexity of capacity planning and cluster management while offering a 33% lower price compared to Redis. The node based deployment provides 20% lower pricing compared to Redis and is ideal for workloads with predictable usage patterns that require specific instance types and configurations.

With the serverless option, you don’t need to worry about capacity planning or managing cache nodes and clusters. While you still need to provide the Amazon Virtual Private Cloud (VPC) and subnet configurations for network isolation, ElastiCache automatically handles the underlying infrastructure scaling, node management, and availability zone distribution to provide a secure and highly available configuration.

The following diagram shows a serverless deployment option for ElastiCache for Valkey spread across two availability zones and subnets inside an Amazon Virtual Private Cloud (VPC).


Figure 1: Architecture diagram of ElastiCache for Valkey serverless deployment

The following diagram shows a node-based cluster deployment option for ElastiCache for Valkey spread across two availability zones and subnets inside an Amazon Virtual Private Cloud (VPC).


Figure 2: Architecture diagram of ElastiCache for Valkey node-based deployment

This post demonstrates both serverless and node-based deployment options, with each solution implementing multiple layers of security to protect the ElastiCache for Valkey deployments:

  • Network isolation: Deploying an ElastiCache cluster in an Amazon Virtual Private Cloud (VPC) with private subnets with no direct internet access
  • Encryption in transit: Enabling TLS for all communications with the cache
  • Encryption at rest: Using AWS Key Management Service ( AWS KMS) to encrypt all stored data
  • Access control: Implementing security groups to restrict traffic
  • Authentication: Using token-based authentication for node-based deployments
  • Logging: Configuring Amazon CloudWatch logs for monitoring and auditing

The Terraform code that we share in this post is organized into reusable modules: a network module that creates the VPC, subnets, and security groups, and deployment-specific modules for serverless and node-based configurations.

Walkthrough

In this walkthrough, we deploy both serverless and node-based ElastiCache for Valkey clusters using Terraform.The walkthrough includes the following steps:

  1. Clone the repository and review the network module
  2. Deploy the serverless ElastiCache for Valkey cluster (typically takes 2-3 minutes)
  3. Deploy the node-based ElastiCache for Valkey cluster (typically takes 8-10 minutes)
  4. Review security implementations and best practices
  5. Clean up resources

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account with correct permissions to create VPC, ElastiCache, AWS KMS, and AWS Secrets Manager resources
  • Terraform (v1.0.0 or later) installed on your local machine
  • AWS CLI configured with your credentials and appropriate AWS Region
  • Basic understanding of Terraform and AWS networking concepts

Step 1: Clone the repository and review the network module

Start by cloning the GitHub repository containing the Terraform code and reviewing the foundational network infrastructure.

To clone the repository and examine the network module

  1. Open your terminal and clone the repository:
    git clone https://github.com/aws-samples/amazon-elasticache-samples.git
    		cd amazon-elasticache-samples/blogs/elasticache-valkey/terraform
  2. Review the network module structure:
    ls -la network/
  3. Examine the main network configuration that creates the foundational infrastructure:
    # VPC with private subnets
    resource "aws_vpc" "this" {
      cidr_block = var.vpc_cidr
      tags       = merge(var.tags, { "Name" = var.vpc_name })
    }
    
    # Private subnets across multiple AZs
    resource "aws_subnet" "private" {
      count             = length(var.subnet_cidr_private)
      vpc_id            = aws_vpc.this.id
      cidr_block        = var.subnet_cidr_private[count.index]
      availability_zone = data.aws_availability_zones.available.names[count.index % 3]
      tags              = merge(var.tags, { "Name" = "${var.vpc_name}-private-${count.index + 1}" })
    }
    
    # Security group with restrictive rules
    resource "aws_security_group" "custom_sg" {
      name        = "${var.vpc_name}_allow_inbound_access"
      description = "manage traffic for ${var.vpc_name}"
      vpc_id      = aws_vpc.this.id
    }

The network module creates a secure foundation with private subnets across multiple Availability Zones and restrictive security groups to ensure network isolation.

Step 2: Deploy the serverless ElastiCache for Valkey cluster

Navigate to the serverless directory and deploy the serverless cache configuration.

Note: To choose the node based cluster option, please navigate to Step 3.

To deploy the serverless option

  1. Navigate to the serverless directory and initialize Terraform:
    cd terraform/serverless
    terraform init
  2. Review the serverless configuration and understand the key security and performance settings:
    resource "aws_elasticache_serverless_cache" "serverless_cache" {
      engine = "valkey"
      name   = var.name
      cache_usage_limits {
        data_storage {
          maximum = var.data_storage
          unit    = "GB"
        }
        ecpu_per_second {
          maximum = var.ecpu_per_second
        }
      }
      daily_snapshot_time      = var.daily_snapshot_time
      description              = "ElastiCache cluster for ${var.name}"
      major_engine_version     = var.major_engine_version
      snapshot_retention_limit = var.snapshot_retention_limit
      security_group_ids       = [module.vpc.security_group.id]
      subnet_ids               = module.vpc.private_subnets..id
      kms_key_id               = aws_kms_key.encrypt_cache.arn
    }

    This configuration includes several critical elements for a secure serverless deployment:

    • Resource limits: The `cache_usage_limits` block prevents unexpected costs by capping both storage (in GB) and compute capacity (ECPU per second)
    • Network isolation: The cache is deployed in private subnets with restrictive security groups to prevent unauthorized access
    • Encryption: Uses a custom KMS key rather than AWS-managed encryption for enhanced security control
    • Backup strategy: Automated daily snapshots with configurable retention periods ensure data recovery capabilities

    These settings ensure both security and cost predictability for your serverless cache deployment.

  3. Deploy the serverless cache:
    terraform apply

The deployment typically takes 2-3 minutes to complete. You’ll see output similar to:

Apply complete! Resources: 12 added, 0 changed, 0 destroyed.

Verify the serverless deployment

Once the resources are provisioned, verify your deployment using the AWS CLI:

aws elasticache describe-serverless-caches \
  --serverless-cache-name your-cache-name \
  --region your-region \
  --query 'ServerlessCaches[0].{Name:ServerlessCacheName,Status:Status,Engine:Engine,DataStorage:CacheUsageLimits.DataStorage.Maximum,ECPULimit:CacheUsageLimits.ECPUPerSecond.Maximum,KMSKeyId:KmsKeyId,SecurityGroups:SecurityGroupIds,Subnets:SubnetIds}'

You should see output like:

{
    "Name": "elasticache-valkey-serverless-demo",
    "Status": "available",
    "Engine": "valkey",
    "DataStorage": 10,
    "ECPULimit": 5000,
    "KMSKeyId": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012",
    "SecurityGroups": [
        "sg-0123456789abcdef0"
    ],
    "Subnets": [
        "subnet-0123456789abcdef0",
        "subnet-0fedcba9876543210"
    ]
}

Verify the following key properties in the CLI output:

  • Status: “available” (cache is ready for use)
  • Engine: “valkey”
  • DataStorage and ECPULimit: Match your configured usage limits
  • KMSKeyId: Shows your custom KMS key ARN (not AWS-managed)
  • SecurityGroups and Subnets: Confirm deployment in private subnets with security groups

This CLI approach provides programmatic verification and can be easily integrated into automation workflows.

Step 3: Deploy the node-based ElastiCache cluster for Valkey cluster

Navigate to the node-based directory and deploy the node-based cache configuration with enhanced security features.

To deploy the node-based cluster option

  1. Navigate to the node-based directory and initialize Terraform:
    cd ../node-based
    terraform init
  2. Review the node-based configuration with comprehensive security settings. This node-based configuration provides enterprise-level security and high availability features:
    • High availability: automatic_failover_enabled and multi_az_enabled ensure the cluster remains available during node failures or AZ outages
    • Scalability: num_node_groups and replicas_per_node_group allow horizontal scaling across multiple nodes and read replicas
    • Comprehensive encryption: Both at_rest_encryption_enabled and transit_encryption_enabled protect data using your custom KMS key
    • Authentication: auth_token from AWS Secrets Manager provides secure access control beyond network-level security
    • Monitoring: Dual log_delivery_configuration blocks capture both slow queries and engine logs in CloudWatch for operational visibility
    • Network security: Deployed in private subnets with restrictive security groups to limit access

    The node-based approach offers more granular control over performance, scaling, and monitoring compared to the serverless option.

    resource "aws_elasticache_replication_group" "node_based" {
      automatic_failover_enabled = true
      subnet_group_name          = aws_elasticache_subnet_group.elasticache_subnet.name
      replication_group_id       = var.name
      description                = "ElastiCache cluster for ${var.name}"
      node_type                  = var.node_type
      engine                     = "valkey"
      engine_version             = var.engine_version
      parameter_group_name       = var.parameter_group_name
      port                       = var.port
      multi_az_enabled           = true
      num_node_groups            = var.num_node_groups
      replicas_per_node_group    = var.replicas_per_node_group
      at_rest_encryption_enabled = true
      kms_key_id                 = aws_kms_key.encrypt_cache.id
      transit_encryption_enabled = true
      auth_token                 = aws_secretsmanager_secret_version.auth.secret_string
      security_group_ids         = [module.vpc.security_group.id]
      log_delivery_configuration {
        destination      = aws_cloudwatch_log_group.slow_log.name
        destination_type = "cloudwatch-logs"
        log_format       = "json"
        log_type         = "slow-log"
      }
      log_delivery_configuration {
        destination      = aws_cloudwatch_log_group.engine_log.name
        destination_type = "cloudwatch-logs"
        log_format       = "json"
        log_type         = "engine-log"
      }
    }
  3. Deploy the node based ElastiCache cluster:
    terraform apply

The deployment typically takes 8-10 minutes to complete due to the multi-AZ configuration and replication setup. You’ll see output similar to:

Apply complete! Resources: 15 added, 0 changed, 0 destroyed.

Verify the node-based deployment

Once the resources are provisioned, verify your deployment using the AWS CLI:

aws elasticache describe-replication-groups \
  --replication-group-id your-cluster-name \
  --region your-region \
  --query 'ReplicationGroups[0].{Name:ReplicationGroupId,Status:Status,Engine:Engine,NodeType:CacheNodeType,MultiAZ:MultiAZEnabled,AtRestEncryption:AtRestEncryptionEnabled,TransitEncryption:TransitEncryptionEnabled,AuthToken:AuthTokenEnabled}'

You should see output similar to:

{
    "Name": "elasticache-valkey-node-based-demo",
    "Status": "available",
    "Engine": "valkey",
    "NodeType": "cache.t2.small",
    "MultiAZ": true,
    "AtRestEncryption": true,
    "TransitEncryption": true,
    "AuthToken": true
}

Verify the following key properties in the CLI output:

  • Status: “available” (indicating all nodes are healthy)
  • Engine: “valkey” with your specified version
  • MultiAZ: true (enabled across multiple Availability Zones)
  • AtRestEncryption: true (encryption at rest enabled)
  • TransitEncryption: true (encryption in transit enabled)
  • AuthToken: true (token-based authentication configured)

This CLI approach provides programmatic verification and can be easily integrated into automation scripts or CI/CD pipelines.

Monitoring deployment progress

During deployment, you can monitor the progress by:

  • Terraform output: Watch the real-time resource creation status
  • AWS Console: Navigate to ElastiCache to see cluster status updates
  • AWS CLI: Check cluster status with aws elasticache describe-replication-groups --replication-group-id your-cluster-name
  • CloudWatch: Monitor logs for any deployment issues

Step 4: Review security implementations

This solution implements multiple layers of security through custom AWS KMS encryption and authentication mechanisms.

KMS encryption implementation

Both deployments use custom AWS KMS keys instead of AWS-managed keys to provide greater security control. Custom KMS keys offer several advantages over the default AWS-managed encryption:

  1. Granular access control: You define exactly which services and roles can use the key through custom key policies
  2. Audit trail: All key usage is logged in CloudTrail, providing complete visibility into who accessed your encrypted data and when
  3. Key rotation control: You can enable or disable automatic key rotation based on your compliance requirements
  4. Cross-account access: Custom keys can be shared across AWS accounts with precise permission controls
  5. Compliance requirements: Many regulatory frameworks require customer-managed encryption keys for sensitive data

Here’s how we implement the custom KMS key with restrictive policies:

resource "aws_kms_key" "encrypt_cache" {
  description             = "KMS key to encrypt cache for ${var.name}"
  deletion_window_in_days = 7
  enable_key_rotation     = true
}

resource "aws_kms_key_policy" "encrypt_policy" {
  key_id = aws_kms_key.encrypt_cache.id
  policy = jsonencode({
    Id = "${var.name}-encrypt-cache"
    Statement = [
      {
        Action = "kms:*"
        Effect = "Allow"
        Principal = {
          AWS = "${local.principal_root_arn}"
        }
        Resource = "*"
        Sid      = "Enable IAM User Permissions"
      },
      {
        Sid    = "Allow ElastiCache Service"
        Effect = "Allow"
        Principal = {
          Service = "elasticache.amazonaws.com"
        }
        Action = [
          "kms:Decrypt",
          "kms:GenerateDataKey",
          "kms:CreateGrant",
          "kms:ReEncrypt*",
          "kms:DescribeKey"
        ]
        Resource = "*"
        Condition = {
          StringEquals = {
            "kms:ViaService" = "elasticache.${var.region}.amazonaws.com"
          }
        }
      }
    ]
    Version = "2012-10-17"
  })
}

When implementing AWS KMS encryption for ElastiCache, consider the following best practices:

  1. Key rotation: Enable automatic key rotation to enhance security over time
  2. Key policies: Use least-privilege policies that grant access only to necessary services
  3. Cross-Region considerations: Plan for multi-Region deployments and key availability
  4. Monitoring: Set up CloudWatch alarms for key usage and access patterns
  5. Backup and recovery: Include AWS KMS key policies in your disaster recovery planning
  6. Compliance: Ensure key management aligns with regulatory requirements (FIPS, SOC, etc.)

For detailed KMS security guidance, refer to the AWS KMS Best Practices documentation.

Authentication for node-based deployments

The node-based deployment implements token-based authentication using AWS Secrets Manager:

resource "random_password" "auth" {
  length           = 128
  special          = true
  override_special = "!&#$^<>-"
}

resource "aws_secretsmanager_secret" "elasticache_auth" {
  name                    = var.name
  recovery_window_in_days = 0
  kms_key_id              = aws_kms_key.encrypt_secret.id
}

resource "aws_secretsmanager_secret_version" "auth" {
  secret_id     = aws_secretsmanager_secret.elasticache_auth.id
  secret_string = random_password.auth.result
}

This approach provides secure generation, storage, access control, and auditability for authentication tokens. When implementing and managing authentication tokens, consider the following best practices:

  1. Token rotation: Implement a policy to regularly rotate authentication tokens. AWS Secrets Manager can automate this process.
  2. Least privilege access: Ensure that only necessary services and roles have access to retrieve the token from Secrets Manager.
  3. Monitoring and alerting: Set up Amazon CloudWatch alarms to monitor for unusual access patterns or failed authentication attempts.
  4. Encryption in transit: Always use HTTPS/TLS when transmitting tokens between services.
  5. Application-level security: Implement proper error handling in your applications to avoid leaking token information in error messages.
  6. Auditing: Regularly review AWS CloudTrail logs for Secrets Manager and ElastiCache to track token usage and detect any unauthorized access attempts.
  7. Disaster recovery: Include the authentication token in your backup and disaster recovery processes to ensure business continuity.
  8. Token revocation: Implement a process for immediate token revocation in case of a security breach.

When reviewing your authentication setup, look for these elements and ensure they align with your organization’s security policies and compliance requirements. Regular security audits should include verification of these practices. For comprehensive guidance on ElastiCache security best practices, refer to the ElastiCache Security Best Practices documentation and the ElastiCache Authentication and Access Control guide.

Cleaning up

To avoid incurring future charges, delete the resources by running the following commands:

# In the node-based directory
terraform destroy

# In the serverless directory
cd ../serverless
terraform destroy

Confirm the deletion when prompted by typing yes for each destroy operation.

Conclusion

In this post you learned how to use Terraform to deploy secure Amazon ElastiCache for Valkey clusters using both serverless and node-based approaches. With infrastructure as code, you can ensure consistent application of security best practices including network isolation, encryption, and access controls.

For next steps, clone the repository and customize the Terraform configurations to match your specific requirements. Modify the variables in `variables.tf` to adjust instance types, storage limits, or network settings for your environment. You can also adapt the security configurations and AWS KMS key policies to align with your organization’s compliance requirements.


About the authors

Sourav Kundu

Sourav Kundu

Sourav is a DevOps Consultant at AWS. He helps organizations securely migrate to and efficiently build on the AWS cloud using modern software engineering practices. He believes that democratizing cloud knowledge is essential for driving innovation and is committed to helping others succeed in their cloud journey.

Yamini Pradeepika Rathamsetty

Yamini Pradeepika Rathamsetty

Yamini orchestrates digital transformations and architects robust cloud solutions, guiding organizations through their AWS journey as a Senior Delivery Consultant at Professional Services. Drawing from a repertoire of battle-tested strategies across several enterprise migrations, she mastered the art of turning executive visions into technical reality.