Deploy Deep Learning Models on Amazon ECS

by Asif Khan | on | Permalink | Comments |  Share

Artificial intelligence (AI) is the computer science field dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, problem solving, and pattern recognition. Machine learning (ML) and deep learning (DL) are computer science fields derived from the AI discipline.

Most ML and DL systems have two distinct parts: training (or learning) and prediction (or classifying). DL systems are usually developed by data scientists, who are good at mathematics and computer science. Those skills are essential for learning. But to deploy models to the field, you need the DevOps mindset and tools.

The power of DL stems from the learning system’s ability to identify more relationships than humans can code in software, or relationships that humans might not even be able to perceive consciously. After sufficient training, the network of algorithms can begin to make predictions on, or interpretations of, very complex data.

In this post, I show you how to connect the workflow between the data scientists and DevOps. Using a number of AWS services, I take the output of a model’s training and deploy it to perform predictions in real time with low latency and high availability. In particular, I illustrate the ease of deploying DL predict functions using Apache MXNet (a deep learning library), Amazon ECS, Amazon S3, and Amazon ECR, Amazon developer tools, and AWS CloudFormation.

How DL models are developed

The key stages of ML are training/learning and prediction/classifying:

Training/Learning – In this stage, data scientists develop the model and program it to learn with training data. They optimize the algorithms, define the layout of the model’s network and its hyperparameters, and program the model to learn.

The advent of notebooks, such as Jupyter, makes this much easier. Notebooks allow interactive development and collaboration. To try using a Jupyter notebook, see Run Jupyter Notebook and JupyterHub on Amazon EMR.

Prediction/Classification – After the DevOps team deploys the model on a scalable, highly available, and low-latency infrastructure, the trained model predicts the outcome on new observations.

DevOps deploys the models onto the infrastructure, usually with an API, to allow application developers to use the APIs to leverage AI capabilities in their applications. The data scientists continue to improve their models and release new versions all the time, just like other software. Updated models should integrate with the company’s usual CI/CD practices.

Deploying an ML predict function at scale is an evolving science.

Why use Amazon ECS, MXNet, and the AWS developer tools?

Containers accommodate the variance in the resource footprint of, and the many dependencies required by, application profiles. ECS provides a cluster management solution for scheduling many containers in a secure, scalable, and reliable environment.

When you deploy ML algorithms, factors such as memory, CPU, scale, and latency are important considerations. Many ML models are used for high and dynamic transaction per second (TPS) use cases, and the hosting requirements for RAM or GPU can also dramatically affect their use. Both ECS and MXNet provide the flexibility to optimize the prediction infrastructure cost effectively. For example, if you want to save on the cost of your ECS fleet, you can train with a GPU, but predict with a CPU or use Spot Instances. You can also dramatically optimize models’ memory. For more information, see

To illustrate, we have implemented the image classification tutorial from MXNet, which you can find at:

The following architecture diagram shows how a data scientist trains the model and uploads it to S3, and then how the application developer uses the provided code to run predictions.

The automated development workflow uses AWS CodePipeline and AWS CodeCommit to build the Docker image on AWS CodeBuild. The workflow also uses an Application Load Balancer and automatic scaling to ensure high availability. To ensure that performance goals are met, CloudWatch provides logging and alarms.

When the application developer pushes new code to the code repository, CodeBuild rebuilds the code and updates the service. Similarly, the application can ensure that the latest model is picked up from S3.

After the data scientist creates and trains the model, an application developer develops code that relies on the model in S3 and the DevOps team deploys the application. The following code shows how the application loads the model:

def load_model(s_fname, p_fname):
     Load model checkpoint from file.
     :return: (arg_params, aux_params)
     arg_params : dict of str to NDArray
         Model parameter, dict of name to NDArray of net's weights.
     aux_params : dict of str to NDArray/
         Model parameter, dict of name to NDArray of net's auxiliary states.
     symbol = mx.symbol.load(s_fname)
     save_dict = mx.nd.load(p_fname)
     arg_params = {}
     aux_params = {}
     for k, v in save_dict.items():
         tp, name = k.split(':', 1)
         if tp == 'arg':
             arg_params[name] = v
         if tp == 'aux':
             aux_params[name] = v
     return symbol, arg_params, aux_params

The predict function uses the model to perform the prediction. It takes an image as input and classifies it using the pretrained model:

def predict(url, mod, synsets):
     # Loading the input image
     req = urllib2.urlopen(url) 
     arr = np.asarray(bytearray(, dtype=np.uint8)
     cv2_img = cv2.imdecode(arr, -1)
     img = cv2.cvtColor(cv2_img, cv2.COLOR_BGR2RGB)
     if img is None:
         return None

     # Reshape the input image to fit the model input
     img = cv2.resize(img, (224, 224)) 
     img = np.swapaxes(img, 0, 2)
     img = np.swapaxes(img, 1, 2)
     img = img[np.newaxis, :]

     # Predict the labels of the input image
     prob = mod.get_outputs()[0].asnumpy()
     prob = np.squeeze(prob)

     a = np.argsort(prob)[::-1]
     out = []
     for i in a[0:5]:
         out.append('probability=%f, class=%s' %(prob[i], synsets[i]))
     return out

The application is compiled as it goes through the code pipeline, and is then deployed to ECS. The container image has the logic for the API. The application also stores the collected data in S3 and reports metrics to Amazon CloudWatch. This creates a feedback loop for the training data set.

At runtime, the application fetches the latest model from S3. The following is a snippet from the ML model’s JSON symbol file that was uploaded to S3:

      "op": "Convolution", 
      "param": {
        "cudnn_off": "False", 
        "cudnn_tune": "limited_workspace", 
        "dilate": "(1,1)", 
        "kernel": "(3,3)", 
        "no_bias": "True", 
        "num_filter": "64", 
        "num_group": "1", 
        "pad": "(1,1)", 
        "stride": "(1,1)", 
        "workspace": "512"
      "name": "stage1_unit1_conv1", 
      "inputs": [[14, 0], [15, 0]], 
      "backward_source_id": -1

This seamlessly integrated development workflow between ML model training, application development, and application deployment provides the agility to give users smart ML applications fast. 


To deploy the application that uses the ML model, upload the predict function code to CodeCommit, and then build a deployment pipeline to build the Docker image and deploy it to ECS. To simplify deployment, you use an AWS CloudFormation template. Then you test the API by running a prediction.

Step 1: Create a CodeCommit repository and connect to it

  1. Clone the GitHub repository :
    git clone
  1. On the AWS Management Console, choose CodeCommit.
  2. For the Region, choose US East (N. Virginia).
  3. Choose Create repository.
  4. Type a repository name that is unique across your account (for this exercise, use image-classification-predict) and a description, and then choose Create repository.You will get a URL to your CodeCommit repository similar to
  1. Use HTTPS or SSH to connect to your CodeCommit repository. For instructions, see

Step 2: Commit the source code and configuration files to your CodeCommit repository

  1. On your laptop or EC2 instance, clone a copy of the CodeCommit repository:
    cd ~
    git clone ssh://

    This creates a folder with the same name as the one in the path you used to execute the git clone command.

  2. Copy the contents of the image-classification-predict directory into this folder with the following command:
    cp -r ecs-mxnet-example/image-classification-predict/ image-classification-predict/
  3. Change the buildspec.yml file to include your AWS account number.
    <your account number>
  4. Commit all of the copied contents to your CodeCommit repository:
    git add --all
    git commit -m "Initial Commit"
    git push origin master

    Tip: Verify the file .git/config for remote==”origin” and branch==”master”.

Step 3: Deploy the deployment pipeline and the prediction function to ECS

The provided AWS CloudFormation template creates an ECS cluster in a VPC and deploys the application code with the needed dependencies. It also adds CloudWatch for logging and performance monitoring, and automatic scaling with an Application Load Balancer.

  1. Deploy the AWS CloudFormation template.
    • Sign in to the AWS Management Console.
    • Choose AWS Services.
    • Choose AWS CloudFormation.
    • Choose Create a new stack.
    • In Specify an Amazon S3 template URL, type the following URL:
    • For Stack Name, type testMX.
    • Choose Next.
    • Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
    • Choose Create. It takes about 10 minutes to deploy the stack.
  2. If you want to change the autoscaling behavior, change the scaling policy in ecs-template.yaml, which is in the template folder, as follows:
        Type: AWS::ApplicationAutoScaling::ScalingPolicy
          PolicyName: AStepPolicy
          PolicyType: StepScaling
          ScalingTargetId: !Ref 'ServiceScalingTarget'
    AdjustmentType: PercentChangeInCapacity
            Cooldown: 60
            MetricAggregationType: Average
            - MetricIntervalLowerBound: 0
              ScalingAdjustment: 200

    The development workflow is preconfigured with the CI/CD pipeline to ensure the integration of application development and model training.

Step 4: Test using a REST client, such as POSTMAN

  1. Get the URL for the Application Load Balancer from the AWS CloudFormation output for the master stack.
  2. Use a REST client like POSTMAN to test the API using the ML Service URL from step 1. The GET request looks similar to this:
    GET http://< MLServiceUrl>/image?image=,_San_Francisco,_California_LCCN2013633353.tif/lossy-page1-450px-Golden_Gate_Bridge,_San_Francisco,_California_LCCN2013633353.tif.jpg

    The API predicts that the image is a suspension bridge with 62% probability.


With AWS, you can continuously deploy a secure and scalable container-based DL model and prediction API.  Data scientists can develop the best models, and application developers can easily consume the DL APIs on ECS with an efficient development workflow. This helps build a bridge between the data science team and the DevOps team that improves knowledge flow and systems (the example image wasn’t random).

You can use Amazon API Gateway to extend the API for traffic management, authorization and access control, monitoring, and API version management.

We encourage you to adapt the code, which is available on GitHub and extend it to address your use cases.

Further reading

For a good introduction to using MXNet on AWS, see AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale.

To run the Amazon Deep Learning AMI on a Jupyter Notebook to using high performance GPU instances. see The AWS Deep Learning AMI, Now with Ubuntu on the AWS AI Blog .

For MXNet tutorials, see


About the Author

Asif Khan is a Solutions Architect with Amazon Web Services. He provides technical guidance, design advice and thought leadership to some of the largest and successful AWS customers and partners on the planet. His deepest expertise spans application architecture, containers, devops, security, machine learning and SaaS business applications. Over the last 12 years, he’s brought an intense customer focus to challenging and deeply technical roles in multiple industries. He has a number of patents and has successfully led product development, architecture and customer engagements.