How do I troubleshoot OutOfMemory errors in Amazon ECS?

Last updated: 2022-09-16

I want to troubleshoot memory usage issues in my Amazon Elastic Container Service (Amazon ECS) task.

-or-

The containers in my Amazon ECS task are exiting due to OutOfMemory error.

Short description

By default, a container has no resource constraints and can use as much resources as the host’s kernel scheduler allows. With Docker, you can control the amount of memory used by a container. Be sure not to allow a running container to consume most of the host machine’s memory. On Linux hosts, when the kernel detects that there isn't enough memory to perform important system functions, it throws an OutOfMemory exception and starts to end the processes to free up memory.

With Docker, you might use either of the following:

  • Hard memory limits that allow the container to use no more than a certain amount of user or system memory
  • Soft limits that allow the container to use as much memory as required unless certain conditions, such as low memory or contention on the host machine, occur

When an Amazon ECS task is ended because of OutOfMemory issues, you might receive the following error message:

OutOfMemoryError: Container killed due to memory usage

You get this error when a container in your task exits because the processes in the container consume more memory than the amount that was allocated in the task definition.

Resolution

To troubleshoot OutOfMemory errors in your Amazon ECS task, do the following:

stats max(MemoryUtilized) as mem, max(MemoryReserved ) as memreserved by bin (5m) as period, TaskId, ContainerName
| sort period desc | filter ContainerName like “example-container-name” | filter TaskId = “example-task-id”

To mitigate the risk of task instability due to OutOfMemory issues, do the following:

  • Perform tests to understand the memory requirements of your application before placing the application in production. You can perform a load test on the container within a host or server. Then, you can check the memory usage of the containers using docker stats.
  • Be sure that your application runs only on hosts with adequate resources.
  • Limit the amount of memory that your container can use. You can do this by setting appropriate values for hard limit and soft limit for your containers. Amazon ECS uses several parameters for allocating memory to tasks: memoryReservation for soft limit and memory for hard limit. When you specify these values, they are subtracted from the available memory resources for the container instance where the container is placed.
    Note: The parameter memoryReservation isn't supported for Windows containers.
  • You can turn on swap for containers with high transient memory demands. Doing so reduces the chance of OutOfMemory errors when the container is under high load.
    Note: If you're using tasks that use the AWS Fargate launch type, then parameters maxSwap and sharedMemorySize aren't supported.
    Important: Be aware of when you configure swap on your Docker hosts. Turning on swap can slow down your application and reduce the performance. However, this feature prevents your application from running out of system memory.

To detect Amazon ECS tasks that were ended because of OutOfMemory events, use the following AWS CloudFormation template. With this template, you can create an Amazon EventBridge rule, Amazon Simple Notification Service (Amazon SNS) topic, and an Amazon SNS topic policy. When you run the template, the template asks for an email list, topic name, and a flag to turn monitoring on or off.

AWSTemplateFormatVersion: 2010-09-09
Description: >
        - Monitor OOM Stopped Tasks with EventBridge rules with AWS CloudFormation.

Parameters:
  EmailList:
    Type: String
    Description: "Email to notify!"
    AllowedPattern: '[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]+'
    Default: "mail@example.com"

  SNSTopicName:
    Type: String
    Description: "Name for the notification topic."
    AllowedPattern: '[a-zA-Z0-9_-]+'
    Default: "oom-monitoring-topic"

  MonitorStatus:
    Type: String
    Description: "Enable / Disable monitor."
    AllowedValues:
      - ENABLED
      - DISABLED
    Default: ENABLED

Resources:
  SNSMonitoringTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !Ref EmailList
          Protocol: email
      TopicName: !Sub ${AWS::StackName}-${SNSTopicName}
      
  SNSMonitoringTopicTopicPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties:
      Topics:
        - !Ref SNSMonitoringTopic
      PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Sid: SnsOOMTopicPolicy
            Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: [  'sns:Publish' ]
            Resource: !Ref SNSMonitoringTopic
          - Sid: AllowAccessToTopicOwner
            Effect: Allow
            Principal:
              AWS: '*'
            Action: [  'sns:GetTopicAttributes',
                       'sns:SetTopicAttributes',
                       'sns:AddPermission',
                       'sns:RemovePermission',
                       'sns:DeleteTopic',
                       'sns:Subscribe',
                       'sns:ListSubscriptionsByTopic',
                       'sns:Publish',
                       'sns:Receive' ]
            Resource: !Ref SNSMonitoringTopic
            Condition:
              StringEquals:
                'AWS:SourceOwner': !Ref 'AWS::AccountId'
          
  EventRule:
    Type: AWS::Events::Rule
    Properties:
      Name: ECSStoppedTasksEvent
      Description: Triggered when an Amazon ECS Task is stopped
      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - ECS Task State Change
        detail:
          desiredStatus:
            - STOPPED
          lastStatus:
            - STOPPED
          containers:
            reason:
              - prefix: "OutOfMemory"
      State: !Ref MonitorStatus
      Targets:
        - Arn: !Ref SNSMonitoringTopic
          Id: ECSOOMStoppedTasks
          InputTransformer:
            InputPathsMap:
              taskArn: $.detail.taskArn
            InputTemplate: >
                "Task '<taskArn>' was stopped due to OutOfMemory."

After you create the CloudFormation stack, you can verify your email to confirm the subscription. After a task is ended due to OutOfMemory issue, you get an email with a message similar to the following:

"Task 'arn:aws:ecs:eu-west-1:555555555555:task/ECSFargate/0123456789abcdef0123456789abcdef' was stopped due to OutOfMemory."

Did this article help?


Do you need billing or technical support?