Announcing additional Linux controls for Amazon ECS tasks on AWS Fargate

Introduction

An Amazon Elastic Container Service (Amazon ECS) task is a number of co-located containers that are scheduled on to AWS Fargate or an Amazon EC2 container instance. Containers use Linux namespaces to provide workload isolation—and with namespaces—even though containers are scheduled together in an Amazon ECS task, they’re still isolated from each other and from the host.

Today we’re excited to announce that customers can now tune Linux kernel parameters in ECS tasks on AWS Fargate. Tuning Linux kernel parameters can help customers optimize their network throughput when running containerized network proxies or achieve higher levels of workload resilience by terminating stale connections. This launch provides parity for ECS tasks launched on AWS Fargate and Amazon EC2 container instances.

Additionally with today’s launch, the process id (PID) namespace can now be shared by all containers in the same ECS task on AWS Fargate. Sharing the PID namespace between containers in the same ECS task unlocks additional workload observability on AWS Fargate. Observability tools, such as container runtime security tools, can now run as a side car container and observe an application’s processes in the shared PID namespace. The PID namespace joins the network namespace, which is used with the awsvpc networking mode, in the list of namespaces that can be shared by all containers in an ECS task on AWS Fargate.

In this post, we’ll dive into System Controls and PID namespace sharing on AWS Fargate.

System controls

Within a Linux system, the parameters of the kernel can be tuned with the command line utility sysctl. When starting containers locally, for example with the Docker or finch command line interface, you can pass in the --sysctl flag to change the kernel parameters. Within an ECS task, parameters can be defined with the systemControl key in a Task Definition.

Customers running containerized network proxies on AWS Fargate have told us they often need to tune net.* kernel parameters to allow their workloads to reach higher throughput demands. Frequently requested kernel parameters include the maximum number of queued connections with net.core.somaxconn and the range of temporary ephemeral ports with net.ipv4.ip_local_port_range.

When architecting a workload for resiliency, customers have also told us that they would like to customize an ECS task’s TCP keep alive parameters on AWS Fargate. Configuring a short TCP keep alive timeout allows an application to detect network failures quickly, closing existing failed connections. Examples of when to tune the TCP keep alive include when a containerized workload is communicating to an Amazon Aurora PostgreSQL Cluster and when troubleshooting an Amazon VPC NAT Gateway.

Up until this launch, the systemControl key was only available to customers running ECS tasks on EC2 container instances, but today this is now available for ECS tasks on AWS Fargate. An abstract of an example Amazon ECS Task Definition where two parameters are tuned is shown below:

{
    ...
    "containerDefinitions": [
        {
            "name": "myproxy",
            "image": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/myproxy:latest",
            "essential": true,
            "systemControls": [
                {
                    "namespace": "net.core.somaxconn",
                    "value": "6000"
                },
                {
                    "namespace": "net.ipv4.ip_local_port_range",
                    "value": "1024    65000"
                }
            ]
        }
    ]
}

The full list of parameters available to ECS tasks on AWS Fargate and Amazon EC2 container instance can be found in the Amazon ECS documentation.

If you are using the kernel parameters in the IPC namespace, then you could set unique values for each container in the task as the IPC namespace is not shared. However, if you are using the parameters in the network namespace, then setting a parameter for one container changes the parameter for all containers in the task as this is a shared namespace. To expand on this:

If net.ipv4.tcp_keepalive_time=100 is set in container one, then this change is also reflected in container two.
If net.ipv4.tcp_keepalive_time=100 is set in container one and net.ipv4.tcp_keepalive_time=200 is set in container two, then the parameter for the namespace is set to whichever container starts last in the task.

Sharing the process id namespace

The process id namespace restricts what a process in a container can see. By default, a containerized process can only see processes in the same container, not the processes of the other co-located containers or the underlying host. A common use case to share a process id namespace is for observability tools. Container runtime security tools often run in a side car container and need to observe the processes in the application container. In this pattern, a process in the side car container monitors the processes in the application container to see if they start to make suspicious system calls.

With today’s launch, customers can now share a process id namespace among containers in the same ECS task by passing in the pidMode key with the value task in a Task Definition.

Walkthrough

In this walkthrough we will start an ECS task with a shared process id namespace. The task contains two containers, an application container (nginx) and a sidecar container (a sleep process used as a demonstration). We will then show how a process in the sidecar container can interact with processes in the application container.

Prerequisites

An existing Amazon ECS cluster and Amazon VPC. If you need to create these in your AWS Account, see the Amazon ECS getting started guide.
Ensure the ECS exec prerequisites are met in your AWS account and on the workstation you plan to execute the commands.

Run an ECS task on AWS Fargate

1. Create a Task Definition with two containers and pidMode enabled. You will need to replace the IAM roles (executionRoleArn and taskRoleArn) within the Task Definition with IAM roles created in the prerequisites.

$ cat <<EOF > taskdef.json 
{
    "family": "fargatepidsharing",
    "executionRoleArn": "arn:aws:iam::111222333444:role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::111222333444:role/ecsTaskExecRole",
    "networkMode": "awsvpc",
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "public.ecr.aws/nginx/nginx:1.25-perl",
            "essential": true
        },
        {
            "name": "sleeper",
            "image": "public.ecr.aws/amazonlinux/amazonlinux:2",
            "essential": true,
            "command": [
                "sleep",
                "infinity"
            ],
            "linuxParameters": {
                "initProcessEnabled": true
            }
        }
    ],
    "cpu": "256",
    "memory": "512",
    "pidMode": "task"
}
EOF

2. Register the Task Definition with the aws ecs register-task-definition command.

$ aws ecs register-task-definition \
    --cli-input-json file://taskdef.json

3. Run the Amazon ECS task on AWS Fargate with the aws ecs run-task command. In the example below, replace the ECS cluster name, the VPC subnet and the security group values. As this task will not be externally accessed, a private VPC subnet and a security group with no ingress rules (such as the default VPC security group) will suffice.

$ aws ecs \
    run-task \
    --count 1 \
    --launch-type FARGATE \
    --task-definition fargatepidsharing \
    --cluster mycluster \
    --enable-execute-command \
    --network-configuration "awsvpcConfiguration={subnets=["subnet-07bd4d10ea848a008"],securityGroups=[sg-061b33f4ed6b97c34],assignPublicIp=DISABLED}"

4. Once the ECS task is running, use the aws ecs execute-command command to create a terminal session in the sidecar container within the ECS task. If you receive an error when running this command, you can use the amazon-ecs-exec-checker script to ensure all of the prerequisites have been met.

# Retrieve the ECS task ID
$ aws ecs list-tasks \
     --cluster mycluster
{
    "taskArns": [
        "arn:aws:ecs:us-west-2:111222333444:task/moira-prod/5ce56f226dd4477a9f57918a98fc852f"
    ]
}

# Exec into the running ECS task
$ aws ecs execute-command \
    --cluster mycluster \
    --task 5ce56f226dd4477a9f57918a98fc852f \
    --container sleeper \
    --interactive \
    --command "/bin/bash"

5. Within the ECS exec terminal session we can now explore the shared PID namespace. To do so, we need to install some diagnostics tools inside of the sidecar container.

$ yum install procps strace -y

6. Using the ps command (included in the procps package installed above) we can see all of the running processes in the shared PID namespace. The output shows the processes from the sidecar container as well as the nginx processes from the application container. The AWS Systems Managed Session Manager processes used to provide the ECS exec terminal are also shown.

$ ps -aef –-forest
UID        PID  PPID  C STIME TTY          TIME CMD
root        38     0  0 09:53 ?        00:00:00 /managed-agents/execute-command/amazon-ssm-agent
root        72    38  0 09:53 ?        00:00:00  \_ /managed-agents/execute-command/ssm-agent-worker
root        34     0  0 09:53 ?        00:00:00 /managed-agents/execute-command/amazon-ssm-agent
root        73    34  0 09:53 ?        00:00:00  \_ /managed-agents/execute-command/ssm-agent-worker
root       266    73  0 09:58 ?        00:00:00      \_ /managed-agents/execute-command/ssm-session-worker ecs-execute-command-0147ec3fd84d94d24
root       276   266  0 09:58 pts/1    00:00:00          \_ /bin/bash
root       286   276  0 09:59 pts/1    00:00:00              \_ ps -aef –forest
root         8     0  0 09:53 ?        00:00:00 /dev/init -- sleep infinity
root        19     8  0 09:53 ?        00:00:00  \_ sleep infinity
root         7     0  0 09:53 ?        00:00:00 nginx: master process nginx -g daemon off;
101         56     7  0 09:53 ?        00:00:00  \_ nginx: worker process
101        285     7  0 09:59 ?        00:00:00  \_ nginx: worker process
root         1     0  0 09:53 ?        00:00:00 /pause

7. Using the shared PID namespace we can monitor the system calls made by a process in the application container. We will use the strace package installed in step five to monitor the main nginx process. To generate system calls, we will forcefully stop a nginx worker process with the kill command. In my case the main nginx process is process ID 7 and a worker process is process ID 56, these will be different in your environment and need to be replaced in the commands below.

# Start the process monitoring
$ strace -p 7 -o straceoutput.txt &
# Stop an nginx worker process
$ kill -9 56
# Show the process monitoring logs
$ cat straceoutput.txt
rt_sigsuspend([], 8)                    = ? ERESTARTNOHAND (To be restarted if no handler)
<snipped>

In this walkthrough we have shown that by sharing a process id namespace, a process in a sidecar container can now interact and observe with all running processes in all containers in the ECS task. In step seven we used a sidecar container to monitor and forcefully stop processes in the application container.

Cleanup

To clean up this walkthrough:

1. Exit the ECS exec terminal by running the exit command in the terminal window with the open session.

2. Stop the ECS task with the aws ecs stop-task command.

$ aws ecs stop-task \
    --cluster mycluster \
    --task 5ce56f226dd4477a9f57918a98fc852f

3. Deregister the ECS task definition with the aws ecs deregister-task-definition command.

$ aws ecs deregister-task-definition \
    --task-definition fargatepidsharing:1

Caveats of sharing a process id namespace

While a process id namespace can now be shared by containers in the same ECS task on AWS Fargate, there a few things to be aware of. We’ll walk through those caveats in the context of the application and sidecar ECS task defined previously:

A process in the sidecar container can observe, stop, or restart a process in the application container.
A process in the sidecar container can view the filesystem of the application container. For example, if the application is running as process ID 7, then within the sidecar container you can access the application containers filesystem at /proc/7/root/. The only protection of the application containers filesystem would be done through Unix file permissions.
When sharing a process in an ECS task, a new pause process runs as PID 1 for the whole task.
The SYS_PTRACE Linux capability may need to be added to the ECS task to provide full traceability of a processes running in the application container.

Conclusion

With this launch we are excited to unblock more workloads on AWS Fargate. Sharing a process id namespace and tuning kernel parameters are both features that were requested through the AWS Containers Roadmap for AWS Fargate. We value your feedback, and welcome you to submit any additional feature requests or improvements as GitHub issues on the roadmap. For more information on the AWS Fargate security architecture see the AWS Fargate Security Whitepaper.

Containers