Implementing Persistent Storage for AWS Fargate and Amazon EBS

Users love the simplicity of AWS Fargate for running containerized workloads without managing servers, scaling infrastructure, or worrying about underlying compute capacity. As users expand their Fargate adoption, they increasingly want to run applications that require block data storage – content management systems like WordPress that retain uploaded media and plugin configurations, retail applications that need persistent product catalogs and customer data, and data processing pipelines. Amazon Elastic Block Store (EBS) integration with Fargate bridges this gap by providing high-performance, block storage that attaches directly to Fargate tasks.

In this blog post, we demonstrate how to implement block storage for your Fargate workloads using native EBS volume integration, enabling you to run applications with the same operational ease you expect from Fargate.

The Challenge of Container Ephemerality

As organizations mature their containerized application portfolios, they want to leverage Fargate’s serverless benefits for increasingly sophisticated workloads.

Customers seek to run:

Content management systems that need reliable storage for user-generated content, media files, and configuration data
Data processing applications that require intermediate data persistence across processing stages and job restarts
Machine learning workloads that want to maintain model checkpoints, training data, and inference results
Database applications that need consistent, high-performance storage with predictable I/O characteristics

Native EBS integration with Fargate eliminates this complexity by providing dedicated block storage that seamlessly integrates with the container lifecycle.

Architecture Overview

The solution implements a multi-tier architecture that combines serverless containers with block storage across multiple Availability Zones (AZ) for high availability.

Each task has a dedicated EBS volume providing io2 or gp3 storage with consistent performance and encryption at rest. Security groups control traffic flow while Amazon CloudWatch logs provide capabilities for debugging purposes.

Important: EBS Zonal Characteristics

EBS is a zonal service – each EBS volume exists within a single AZ and can only attach to EC2 instances or Fargate tasks in that same zone. This design makes EBS integration particularly well-suited for zone-isolated applications such as development and testing environments where workloads don’t require cross-zone redundancy, batch processing jobs that operate on localized datasets within a specific zone, content management systems serving region-specific content, and data analytics workloads that process zone-resident data.

For applications requiring cross-AZ data availability:

Leverage enterprise and open source Multi-AZ solutions such as Veritas Alta for enterprise backup/recovery, Distributed Replicated Block Device (DRBD) for real-time block device replication, or legacy solutions like Sun Network Data Replicator (SNDR) for existing infrastructure integration
Use Amazon EFS (Elastic File System) for shared storage accessible across multiple AZs
Design stateless applications with external data stores (RDS, DynamoDB) for shared state
Consider EBS snapshot-based backup and restore procedures for disaster recovery. Additionally, you can also consider recently launched feature of EBS such as time-based copy and Provisioned Rate for Volume Initialization.

For zone-isolated applications:

Each Fargate task receives its own EBS volume within its deployment AZ
Data remains highly available within the zone but requires backup strategies for cross-zone recovery

Prerequisites

Before beginning, you need to have the following:

Development Tools

AWS CDK version 2.0+ – Setup Guide
Node.js 14.x+ – Download
Docker Engine – Installation Guide
Python 3.8+ – Download
AWS account with appropriate permissions

Environment Setup

Install the AWS CDK globally and configure your AWS credentials:

# Install AWS CDK globally
npm install -g aws-cdk

# Set default account for CDK
export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)

# Verify CDK installation
cdk --version

Building the Infrastructure

Establishing the Network Foundation

We recommend setting up a network foundation using an Amazon Virtual Private Cloud (VPC) configured with a multi-Availability Zone architecture that has at least two AZs, preferably three, incorporating both public and private subnets with /24 CIDR blocks to provide 256 IP addresses per subnet for scalability.

Configuring ECS with EBS Volumes and Persistence Management

The ECS cluster and task definition incorporate EBS volume configuration with specific considerations for maintaining data persistence during service interruptions, updates, and scaling events:

// Create ECS cluster with Container Insights enabled
const cluster = new ecs.Cluster(this, 'Cluster', {
  vpc,
  containerInsights: true // Enable CloudWatch Container Insights
});


// Create Fargate task definition with EBS volume
const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
  memoryLimitMiB: 512, // 512MB memory allocation
  cpu: 256, // 0.25 vCPU allocation
  volumes: [volumeConfiguration] // Attach EBS volume configuration
});

The container configuration mounts the EBS volume at a specific path, making block storage available to the application:

// Add container to task definition
const container = taskDefinition.addContainer('app', {
  image: ecs.ContainerImage.fromAsset('./app'), // Build from local Dockerfile
  logging: ecs.LogDrivers.awsLogs({ 
    streamPrefix: 'ecs-ebs',
    logRetention: logs.RetentionDays.ONE_WEEK // Log retention policy
  }),
  environment: {
    STORAGE_PATH: '/data' // Environment variable for storage path
  },
  healthCheck: {
    command: ['CMD-SHELL', 'curl -f http://localhost:5000/health || exit 1'],
    interval: Duration.seconds(30),
    timeout: Duration.seconds(5),
    retries: 3
  }
});

// Mount EBS volume to container filesystem
container.addMountPoints({
  sourceVolume: 'ebs-volume', // Reference to volume configuration
  containerPath: '/data', // Mount point in container
  readOnly: false // Allow read/write operations
});

// Expose container port for load balancer
container.addPortMappings({
  containerPort: 5000,
  protocol: ecs.Protocol.TCP
});

Configuring Load Balancing

The Application Load Balancer configuration includes health checks and HTTPS listeners for secure communication:

// Create Application Load Balancer
const loadBalancer = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
  vpc: vpc,
  internetFacing: true, // Public-facing load balancer
  securityGroup: albSecurityGroup,
  loadBalancerName: 'fargate-ebs-alb'
});

// Create target group for Fargate tasks
const targetGroup = new elbv2.ApplicationTargetGroup(this, 'TargetGroup', {
  vpc: vpc,
  port: 5000, // Container port
  protocol: elbv2.ApplicationProtocol.HTTP,
  targetType: elbv2.TargetType.IP, // Required for Fargate
  healthCheck: {
    path: '/health', // Health check endpoint
    healthyHttpCodes: '200',
    interval: Duration.seconds(30),
    timeout: Duration.seconds(5),
    healthyThresholdCount: 2,
    unhealthyThresholdCount: 3
  },
  deregistrationDelay: Duration.seconds(30) // Fast deregistration for development
});

// HTTPS listener with SSL certificate
const httpsListener = loadBalancer.addListener('HttpsListener', {
  port: 443,
  certificates: [certificate], // SSL certificate from ACM
  defaultAction: elbv2.ListenerAction.forward([targetGroup])
});

// Redirect HTTP to HTTPS
loadBalancer.addListener('HttpListener', {
  port: 80,
  defaultAction: elbv2.ListenerAction.redirect({
    protocol: 'HTTPS',
    port: '443',
    permanent: true
  })
});

Deploying the Fargate Service

The Fargate service configuration brings together all components with auto-scaling capabilities:

// Create Fargate service
const fargateService = new ecs.FargateService(this, 'FargateService', {
  cluster: cluster,
  taskDefinition: taskDefinition,
  desiredCount: 2, // Start with 2 tasks for high availability
  assignPublicIp: false, // Tasks run in private subnets
  securityGroups: [ecsSecurityGroup],
  vpcSubnets: { 
    subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS // Deploy in private subnets
  },
  healthCheckGracePeriod: Duration.seconds(60), // Allow time for startup
  serviceName: 'fargate-ebs-service'
});

// Attach service to target group
fargateService.attachToApplicationTargetGroup(targetGroup);

// Configure auto-scaling
const scaling = fargateService.autoScaleTaskCount({
  minCapacity: 2, // Minimum tasks for availability
  maxCapacity: 10 // Maximum tasks for cost control
});

// Scale based on CPU utilization
scaling.scaleOnCpuUtilization('CpuScaling', {
  targetUtilizationPercent: 70, // Scale out at 70% CPU
  scaleInCooldown: Duration.minutes(5), // Wait 5 minutes before scaling in
  scaleOutCooldown: Duration.minutes(2) // Wait 2 minutes before scaling out
});

// Scale based on memory utilization
scaling.scaleOnMemoryUtilization('MemoryScaling', {
  targetUtilizationPercent: 80, // Scale out at 80% memory
  scaleInCooldown: Duration.minutes(5),
  scaleOutCooldown: Duration.minutes(2)
});

Application Implementation

In this flask application, we are demonstrating a non production example of file upload. For best practices, we recommend adding a health check in the application to be invoked from the Application Load Balancer:

from flask import Flask, jsonify, request, send_file
import os
import logging
from datetime import datetime
import boto3

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = Flask(__name__)


# Get storage path from environment variable
STORAGE_PATH = os.environ.get('STORAGE_PATH', '/data')

@app.route('/health')
def health():
    """Basic health check endpoint for load balancer"""
    return jsonify({'status': 'healthy', 'timestamp': datetime.utcnow().isoformat()})

@app.route('/files', methods=['GET'])
def list_files():
    """List all files in the block storage"""
    try:
        if not os.path.exists(STORAGE_PATH):
            return jsonify({'error': 'Storage path does not exist'}), 404
            
        files = []
        for filename in os.listdir(STORAGE_PATH):
            filepath = os.path.join(STORAGE_PATH, filename)
            if os.path.isfile(filepath):
                stat = os.stat(filepath)
                files.append({
                    'name': filename,
                    'size': stat.st_size,
                    'modified': datetime.fromtimestamp(stat.st_mtime).isoformat()
                })
        
        return jsonify({
            'files': files,
            'count': len(files),
            'storage_path': STORAGE_PATH
        })
        
    except Exception as e:
        logger.error(f"Failed to list files: {str(e)}")
        return jsonify({'error': str(e)}), 500

@app.route('/upload', methods=['POST'])
def upload_file():
    """Upload a file to block storage"""
    if 'file' not in request.files:
        return jsonify({'error': 'No file provided'}), 400
        
    file = request.files['file']
    if file.filename == '':
        return jsonify({'error': 'No selected file'}), 400
        
    try:
        # Ensure storage directory exists
        os.makedirs(STORAGE_PATH, exist_ok=True)
        
        # Save file to block storage
        filepath = os.path.join(STORAGE_PATH, file.filename)
        file.save(filepath)
        
        # Get file information
        stat = os.stat(filepath)
        
        logger.info(f"File uploaded successfully: {file.filename}")
        
        return jsonify({
            'message': 'File uploaded successfully',
            'filename': file.filename,
            'size': stat.st_size,
            'path': filepath
        })
        
    except Exception as e:
        logger.error(f"Failed to upload file: {str(e)}")
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    # Create storage directory if it doesn't exist
    os.makedirs(STORAGE_PATH, exist_ok=True)
    
    # Start Flask application
    app.run(host='0.0.0.0', port=5000, debug=False)

Handling Task Interruptions and Updates

When running applications on Fargate with EBS volumes, one of the most critical operational challenges is maintaining data continuity during service lifecycle events.

Fargate tasks can be replaced due to various scenarios:

Service updates when deploying new application versions
Infrastructure maintenance performed by AWS on the underlying compute
Spot interruptions when using Fargate Spot pricing
Auto-scaling events that terminate and create new tasks
Health check failures that trigger task replacement

When a task terminates, its attached EBS volume becomes orphaned — and the replacement task still needs access to that data. You can address this challenge by creating a snapshot of the orphaned volume and restoring it to a new volume for the replacement task. We recommend implementing event-driven volume management: when ECS service events occur (updates, scaling, or interruptions), Amazon CloudWatch Events trigger a AWS Lambda function that orchestrates the full volume persistence lifecycle.

To prevent an infinite loop — where AWS Lambda launching a replacement task generates new ECS events that re-trigger the function — the solution must include loop-prevention safeguards. When AWS Lambda starts a replacement task, it tags it with a marker (e.g., managed-by: ebs-lifecycle-lambda). The function checks for this tag at the start of every invocation and skips processing if it is present. Additionally, a DynamoDB idempotency check ensures that a volume/task combination is processed at most once, even under retries or concurrent invocations. The CloudWatch Events rule is also scoped to specific clusters, services, or task definition families to avoid triggering on unrelated task state changes.

Service Update Scenario

ECS initiates a rolling deployment of the new task definition
CloudWatch detects task state changes (TASK_STOPPING or TASK_STOPPED) and triggers the AWS Lambda function
AWS Lambda checks for the managed-by: ebs-lifecycle-lambda tag — if present, processing is skipped to prevent re-entry
AWS Lambda checks DynamoDB to confirm the volume/task combination has not already been processed (idempotency check)
AWS Lambda identifies the EBS volumes attached to terminating tasks
AWS Lambda retrieves volume details and creates a snapshot
The snapshot reference is recorded in an Amazon DynamoDB table
AWS Lambda creates individual task definitions tailored to each required volume configuration, tagging all replacement tasks with managed-by: ebs-lifecycle-lambda

For services requiring DesiredCount=N, the architecture supports either N separate services each with DesiredCount=1, or a transition to ECS Standalone Tasks instead of ECS Services.

This volume persistence strategy provides operational advantages that enhance the reliability and manageability of applications on Fargate. Application deployments proceed without data loss, enabling continuous delivery practices for applications designed as single-instance services. This pattern works best for single-task services or applications that can be decomposed into multiple single-task services, rather than traditional multi-replica service deployments. We recommend implementing proper snapshot lifecycle management rather than leaving cleanup as an afterthought. Orphaned snapshots without lifecycle policies create unnecessary costs and operational overhead.

Important Note on EBS Snapshots for Distributed Systems: When using EBS snapshots for backup and restore in distributed environments, be aware that ad-hoc snapshots do not automatically maintain application or crash consistency across multiple volumes, requiring manual coordination to ensure data integrity. Additionally, snapshot restoration is a best effort operation with indeterminate time. To optimize performance, consider leveraging AWS’s new provisioned volume hydration rate feature to accelerate restoration and minimize recovery point objectives while balancing storage costs and operational complexity.

Testing and Validation

Before running your tests, ensure your EBS volumes are attached to your Fargate tasks. You can verify this in the AWS Console by navigating to ECS > Clusters > Tasks > Volumes tab.

Step 1: Retrieve Load Balancer URL

After deployment, validate the implementation by testing various endpoints and operations. First, retrieve the Application Load Balancer URL from the CloudFormation outputs:

# Get ALB DNS name from CloudFormation outputs
ALB_URL=$(aws cloudformation describe-stacks \
  --stack-name EcsEbsStack \
  --query 'Stacks[0].Outputs[?OutputKey==`LoadBalancerDNS`].OutputValue' \
  --output text)

echo "Application Load Balancer URL: https://${ALB_URL}"

# Verify ALB is responding
curl -I https://${ALB_URL}/health

Step 2: Test File Operations Upload a test file to verify write operations:

# Create a test file
echo "This is a test file for EBS persistence validation" > test.txt
echo "Created at: $(date)" >> test.txt

# Upload test file
echo "Uploading test file..."
curl -X POST -F "file=@test.txt" https://${ALB_URL}/upload

# List files to confirm upload
echo "Listing files in block storage..."
curl -s https://${ALB_URL}/files | jq '.'

# Download the file to verify content
echo "Downloading uploaded file..."
curl -o downloaded-test.txt https://${ALB_URL}/download/test.txt

# Verify file content
echo "Verifying downloaded file content..."
cat downloaded-test.txt

This testing sequence validates both the EBS volume attachment and the application’s ability to perform read and write operations on the block storage.

Cost Analysis

The following table breaks down monthly costs for a typical deployment in the us-east-1 region with two tasks running continuously:

Category	Monthly Cost
Compute (Fargate)	$35.55
Storage (EBS)	$1.60
Total Estimated Cost	$37

Clean Up

Remove all resources to avoid ongoing charges with proper verification:

# Step 1: Delete the CDK stack
echo "Initiating stack deletion..."
cdk destroy --force

# Step 2: Verify deletion progress
echo "Monitoring deletion progress..."
aws cloudformation describe-stacks \
  --stack-name EcsEbsStack \
  --query 'Stacks[0].StackStatus' \
  --output text

# Step 3: Check for any remaining resources
echo "Checking for orphaned resources..."

# List any remaining EBS volumes
aws ec2 describe-volumes \
  --filters "Name=tag:aws:cloudformation:stack-name,Values=EcsEbsStack" \
  --query 'Volumes[*].[VolumeId,State]' \
  --output table

# List any remaining security groups
aws ec2 describe-security-groups \
  --filters "Name=tag:aws:cloudformation:stack-name,Values=EcsEbsStack" \
  --query 'SecurityGroups[*].[GroupId,GroupName]' \
  --output table

Production Considerations

For production deployments, implement these additional considerations to ensure reliability, security, and operational excellence:

A multi-region configuration which includes cross-region deployment for disaster recovery, enabling automatic failover, and data replication across geographically distributed AWS regions.
A backup and recovery strategy implementation to include automated EBS snapshot creation and deletion, cross-region backup replication, and point-in-time recovery capabilities.
Disaster recovery procedures with defined recovery time objectives (RTO) and recovery point objectives (RPO) to ensure business continuity and data protection against various failure scenarios.

Conclusion

Native EBS volume integration with AWS Fargate provides a robust solution for block storage requirements in containerized applications. This implementation demonstrates how to combine serverless container execution with dedicated block storage while maintaining security and operational standards.

Ready to modernize your containerized storage strategy? Start integrating EBS volumes with your AWS Fargate workloads today. Explore the AWS documentation to get started, or reach out to your AWS solutions architect to design a storage architecture tailored to your application’s needs.

AWS Storage Blog