AWS Compute Blog

How to create a custom scheduler for Amazon ECS

My colleague Daniele Stroppa sent a nice guest post that shows how to create a custom scheduler for Amazon ECS.

Amazon EC2 Container Service (ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS takes care of two key functions required when running modern distributed applications: reliable state management and flexible scheduling. As Werner explained in a recent blog, ECS exposes the cluster state through a set of simple APIs that give you the details about all the instances in your cluster and all the containers running on those instances. In this post, we explain how to make use of the ECS API to create a custom scheduler.

Scheduling

A scheduler understands the needs and requirements of the system—e.g., a container that needs 200 MB RAM and port 80—and tries to efficiently satisfy them. The scheduler then submits a request to the cluster state manager to acquire the required resource.

ECS provides optimistic concurrency control so multiple schedulers can be operating at the same time; the cluster manager can confirm that the resource is available and commit it to the scheduler. The scheduler can listen for events from the cluster manager and take action, such as maintaining the availability of your applications, or interact with other resources like Elastic Load Balancing load balancers.

ECS currently offers two schedulers to find the optimal instance placement based on your resource needs and availability requirements: a task scheduler and a service scheduler. Some customers may find that they have requirements that are not satisfied by one of the current schedulers. For example, if a customer wanted to register tasks with Route 53 instead of Elastic Load Balancing, a custom scheduler could create an SRV record when a task is scheduled and remove the record when the task stops.

In this blog we will show an example custom scheduler that starts tasks on the instance with the least number of running tasks to illustrate the process of creating a custom scheduler.

Implementing a custom scheduler

A custom scheduler makes use of the ECS List* and Describe* API operations to determine the current state of the cluster. It then selects one (or more) container instances according to the logic implemented in the scheduler and uses StartTask to start a task on the selected container instance. For more details about API operations, see the Amazon ECS API Reference.

As an example, let’s say you want to start tasks on the instance with the least number of running tasks. Here’s how to implement a custom scheduler that implements this logic. Start by getting a list of all container instances in a cluster:

def getInstanceArns(clusterName):
    containerInstancesArns = []
    # Get instances in the cluster
    response = ecs.list_container_instances(cluster=clusterName)
    containerInstancesArns.extend(response['containerInstanceArns'])
    # If there are more instances, keep retrieving them
    while response.get('nextToken', None) is not None:
        response = ecs.list_container_instances(
            cluster=clusterName, 
            nextToken=response['nextToken']
        )
        containerInstancesArns.extend(response['containerInstanceArns'])

    return containerInstancesArns

For each instance in the cluster, you can then find out how many tasks are in the RUNNING state and start a task on the instance with the least number of running tasks:

def startTask(clusterName, taskDefinition):
    startOn = []

    # Describe all instances in the ECS cluster
    containerInstancesArns = getInstanceArns(clusterName)
    response = ecs.describe_container_instances(
        cluster=clusterName, 
        containerInstances=containerInstancesArns
    )
    containerInstances = response['containerInstances']

    # Sort instances by number of running tasks
    sortedContainerInstances = sorted(
        containerInstances, 
        key=lambda containerInstances: containerInstances['runningTasksCount']
    )

    # Get the instance with the least number of tasks
    startOn.append(sortedContainerInstances[0]['containerInstanceArn'])
    logging.info('Starting task on instance %s...', startOn)

    # Start a new task
    response = ecs.start_task(
        cluster=clusterName, 
        taskDefinition=taskDefinition, 
        containerInstances=startOn, 
        startedBy='LeastTasksScheduler'
    )

After you string all the pieces together, the custom scheduler looks like this:

#!/usr/bin/env python
import boto3
import argparse
import logging

# Set up logger
logging.getLogger(__name__)
logging.basicConfig(format='%(asctime)s - %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', level=logging.INFO)

# Set up ECS boto client
ecs = boto3.client('ecs')

def getInstanceArns(clusterName):
    containerInstancesArns = []
    # Get instances in the cluster
    response = ecs.list_container_instances(cluster=clusterName)
    containerInstancesArns.extend(response['containerInstanceArns'])
    # If there are more instances, keep retrieving them
    while response.get('nextToken', None) is not None:
        response = ecs.list_container_instances(
            cluster=clusterName, 
            nextToken=response['nextToken']
        )
        containerInstancesArns.extend(response['containerInstanceArns'])

    return containerInstancesArns

def startTask(clusterName, taskDefinition):
    startOn = []

    # Describe all instances in the ECS cluster
    containerInstancesArns = getInstanceArns(clusterName)
    response = ecs.describe_container_instances(
        cluster=clusterName, 
        containerInstances=containerInstancesArns
    )
    containerInstances = response['containerInstances']

    # Sort instances by number of running tasks
    sortedContainerInstances = sorted(
        containerInstances, 
        key=lambda containerInstances: containerInstances['runningTasksCount']
    )

    # Get the instance with the least number of tasks
    startOn.append(sortedContainerInstances[0]['containerInstanceArn'])
    logging.info('Starting task on instance %s...', startOn)

    # Start a new task
    response = ecs.start_task(
        cluster=clusterName, 
        taskDefinition=taskDefinition, 
        containerInstances=startOn, 
        startedBy='LeastTasksScheduler'
    )

# 
# LeastTasks ECS Scheduler
#
if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description='ECS Custom Scheduler to start a task on the instance with the least number of running tasks.'
    )
    parser.add_argument('-c', '--cluster', 
        nargs='?', 
        default='default', 
        help='The short name or full Amazon Resource Name (ARN) of the cluster that you want to start your task on. If you do not specify a cluster, the default cluster is assumed.'
    )
    parser.add_argument('-d', '--task-definition', 
        required=True, 
        help='The family and revision (family:revision) or full Amazon Resource Name (ARN) of the task definition that you want to start.'
    )
    args = parser.parse_args()  

    logging.info('Starting task %s on cluster %s...', args.task_definition, args.cluster)
    startTask(args.cluster, args.task_definition)

Conclusion

This is a very simple example, but it gives an idea of how to use the powerful Amazon ECS API to create a custom scheduler.

A community member recently created ecs_state, which “is a small Go library that uses the ECS List and Describe API operations to store information about running tasks and available resources in memory in SQLite. There are a set of API operations that control when to refresh state, as well as an operation to search for machines with the resources available to accept the task. Further logic and filtering can then be applied in memory before finally calling StartTask or StopTask. This library can reduce the API actions your scheduler needs to perform and simplify finding resources using SQL.”

We look forward to your feedback on enhancements you want for the existing ECS schedulers, and learning about any new schedulers built by the community.