AWS Compute Blog

  • Cluster Management with Amazon ECS

    by Deepak Singh | on | in Amazon ECS |

    In previous blog posts, we have talked a lot about Amazon EC2 Container Service (Amazon ECS) as a way to run Docker containers in the cloud on AWS, but not as much has been said about the cluster management options exposed through the ECS API.

    Today we want to talk a bit about what cluster management is, how ECS empowers it, and — for those familiar with other cluster management systems like Apache Mesos — an example of how existing workloads may take advantage of the ECS API. At the end of this post, you’ll find links to some example code we are open sourcing today; we think it will empower you and your business to create large scale distributed systems with ECS as your cluster management solution.

    Cluster management is becoming an important task as developers and businesses increasingly develop and deploy distributed applications in the cloud. Cluster management systems schedule work and manage the state of each cluster resource. A common example of developers interacting with a cluster management system is when you run a MapReduce job via Apache Hadoop or Apache Spark. Both of these systems typically manage a coordinated cluster of machines working together to perform a large task. In the case of Hadoop or Spark, these tasks are most often data analysis jobs or machine learning.

    Cluster management systems have two challenges. First, there is a lot of overhead from managing the state of the cluster. For example, software like Hadoop and Spark typically have a Leader, or a part of the software that runs in one place and is in charge of coordination. They’ll then have many, often hundreds or even thousands of Followers, or a part of the software that receives commands from the Leader, executes them, and reports state of their sub-task.

    When machines fail, the Leader must detect these failures, replace machines, and restart the Followers that receive commands. This can be a significant portion of code written for applications which need access to a large pool of resources. The second challenge is that each of these applications typically assumes full ownership of the machine where their tasks are running. You will often end up with multiple clusters of machines, each dedicated fully to the management system in use. This can lead to inefficient distribution of resources, and jobs taking longer to run than if a shared pool of resources could be used.

    ECS provides a simple solution to cluster state management: the management of followers (using the ECS Agent), dispatching of sub-tasks to the proper location, and state inspection of the cluster are all exposed through the API. Rather than Spark or Hadoop having to manage a set of machines directly, ECS manages your instances. If you need to find out if a sub-task is still running, or what instances are available, you can simply call the ECS List* and Describe* API actions. This allows distributed systems to cut down on the amount of code needed to go from idea to implementation. Much of the undifferentiated heavy lifting and housekeeping has been abstracted behind a set of APIs. The ability to run multiple tasks on a shared pool of resources can also lead to higher utilization and faster task completion than if compute resources are statically partitioned.

    One of the core principles behind the design of ECS is the separation of the scheduling logic from the state management. This allows you to use the ECS schedulers, write your own schedulers, or integrate with third party schedulers. A common solution for use cases such as data analysis, batch jobs, and machine learning is the open-source cluster management system Apache Mesos, which “provides applications (e.g., Hadoop, Spark, Kafka, Elasticsearch) with APIs for resource management and scheduling across entire datacenter and cloud environments.”

    As an initial proof of concept of how we could start integrating Apache Mesos with ECS, we built an Apache Mesos scheduler driver called ECSSchedulerDriver. This driver allows the Mesos cluster management “start task” commands to be sent directly to ECS. This demonstrates how we can quickly extend ECS based on customer feedback, in some cases, to co-exist and collaborate with existing open source tools such as Mesos and Marathon. You can also write your own schedulers for ECS if you have specific needs.

    If this kind of integration is of interest, or you are interested in integration with other cluster management frameworks or schedulers, please give us feedback via the ECS forum.

  • Dive into Amazon EC2 Container Service (video)

    by Deepak Singh | on | in Amazon ECS |

    At AWS re:Invent, we announced Amazon EC2 Container Service (Amazon ECS), a highly scalable, high performance container management service that supports Docker containers and allows you to easily run distributed applications on a managed cluster of Amazon EC2 instances.. The service is currently available in preview to all AWS customers without any need to sign up for a wait list.

    Last week we recorded a presentation that walks you through why we developed Amazon ECS and key features and components of the service. The video also includes a demo of how you can use the AWS command line interface to get detailed information about your Amazon ECS clusters, list and view Task Definitions that define the containers you want to run together, and launch one or more Task Definitions to schedule Docker containers on your cluster’s Container Instances. In a future post we will go deeper into how you can use third party schedulers like Marathon and Chronos with Amazon ECS.

    We have seen a tremendous amount of interest in the service and are very interested in your feedback. Please visit our forum to tell us about your experience with Amazon ECS and how you think the service could be improved.

    — Deepak

  • AWS Lambda adds CORS support and removes preview signup requirement

    by Tim Wagner | on |

    Tim Wagner Tim Wagner, AWS Lambda

    CORS and Browser JavaScript Support for AWS Lambda

    AWS Lambda now supports CORS, making it easier to call Lambda functions from browser JavaScript code. The default hosted package of the AWS SDK for JavaScript in the Browser now also includes the AWS Lambda APIs by default — check out Working with Services in the Browser.

    No more preview signups!

    AWS Lambda has dropped the signup requirement – the preview is now open to anyone with an AWS account. You can go immediately to the AWS Lambda console and begin using AWS Lambda. We hope this makes it even easier for customers to try out the service, including using the interactive editor and test functionality to write and run Lambda functions directly from a browser.

  • Using Packages and Native nodejs Modules in AWS Lambda

    by Tim Wagner | on | in AWS Lambda |

    Tim Wagner Tim Wagner, AWS Lambda


    Bryan Liston Bryan Liston, AWS Solutions Architect


    In this post we take a look at how to use custom nodejs packages with AWS Lambda, including building and packaging native nodejs modules for use in your Lambda functions. To do the steps below, you’ll need an EC2 instance or a similar machine running Amazon Linux with nodejs installed.

    Using Existing JavaScript Packages with Lambda

    Using npm packages and custom modules/packages with Lambda is easy. We’ll start by including prebuilt modules then move on to native ones.

    Step1: Create a new directory to hold your Lambda function and its modules.

    $ mkdir lambdaTestFunction
    $ cd lambdaTestFunction

    Step 2: Install an npm package

    For this example, we’ll keep things simple and install the AWS SDK. This is just for illustration; the current version of the AWS SDK is pre-installed in Lambda, but you could use this technique to load other pre-built JavaScript packages or if you actually needed an earlier version of the AWS SDK for compatibility reasons.

    $ npm install --prefix=~/lambdaTestFunction aws-sdk
    aws-sdk@2.0.27 node_modules/aws-sdk
    ├── xmlbuilder@0.4.2
    └── xml2js@0.2.6 (sax@0.4.2)
    $ ls node_modules
    aws-sdk

    At this point we’ve installed the ‘aws-sdk’ module into the node_modules directory of our lambdaTestFunction folder.

    Step 3: Verification

    Next we’ll run nodejs locally to make sure we have a valid configuration before proceeding. We’ll create a trivial test.js file that uses the AWS SDK to print some EC2 config data. It’s not interesting in its own right, but it serves to show that the SDK has been installed.

    $ echo 'var AWS = require("aws-sdk");console.log(AWS.EC2.apiVersions)'> test.js
    $ node test.js
    [ '2013-06-15*','2013-10-15*','2014-02-01*','2014-05-01*','2014-06-15*','2014-09-01*','2014-10-01' ]
    

    Step 4: Create Your Lambda Function!

    At this point we’ve successfully created a directory containing one or more npm-installed packages and verified that the packages can load and execute by running a test script locally. You can now delete the test script and continue by creating a real Lambda function that takes advantage of the modules that you’ve just installed, testing it the same way. To deploy the resulting function and modules to Lambda, just zip up the entire lambdaTestFunction directory and use Lambda’s createFunction API, CLI, or the console UI to deploy it.

    Custom JavaScript Modules

    Everything you learned above for npm-installing existing modules also applies to custom modules that you create yourself. We’ll illustrate with a trivial example that just logs to the console. Create a directory under node_modules that contains your custom written packages, and add them to your function, just as you did previously for npm-installed packages:

    $ echo 'console.log("This is a demo")' > node_modules/bryan/index.js
    $ echo 'require("bryan")' > test.js
    $ node test.js
    This is a demo
    

    Native Modules

    Native modules are similarly installed and deployed, but you’ll need to build them against the Amazon Linux libraries. You’ll need to either ensure that the libraries and their transitive dependencies are statically compiled or use rpath-style linking; we’ll do it the static way in this post, and demonstrate use of rpath in a subsequent post. (Note that many, but not all, libraries can be statically linked this way.)

    The following steps will walk you through updating your system, installing the required development libraries and tools, downloading nodejs and our sample library, OpenCV, and finally installing and testing the OpenCV module we create by running some basic facial detection code on a well-known face.

    Step 1: Update your Amazon Linux machine and install required libraries and tools

    $ sudo yum update
    $ sudo yum install gcc44 gcc-c++ libgcc44 cmake –y
    $ wget http://nodejs.org/dist/v0.10.33/node-v0.10.33.tar.gz
    $ tar -zxvf node-v0.10.33.tar.gz
    $ cd node-v0.10.33 && ./configure && make
    $ sudo make install
    

    Step 2: Download your native module

    $ wget http://softlayer-dal.dl.sourceforge.net/project/opencvlibrary/opencv-unix/2.4.9/opencv-2.4.9.zip
    $ mkdir opencv_install
    $ mkdir opencv_example
    $ unzip opencv-2.4.9.zip –d opencv_install/ && cd opencv_install
    

    Step 3: Configure your module for static libraries

    $ cmake -D CMAKE_BUILD_TYPE=RELEASE -D BUILD_SHARED_LIBS=NO -D CMAKE_INSTALL_PREFIX=~/opencv opencv-2.4.9/
    

    Note that we’re disabling shared libraries as we build OpenCV; you can see this in the output:

    --   C/C++:
    --   Built as dynamic libs?:     NO
    --   Install path:               /home/ec2-user/opencv
    

    Step 4: Build and install your module

    $ make && make install
    

    Step 5: Install the package using npm

    Now that we’ve built the OpenCV library to a local folder, we’ll install the npm module to our local lambda example folder, opencv_example.

    $ PKG_CONFIG_PATH=~/opencv/lib/pkgconfig/  npm install –prefix=~/opencv_example opencv
    

    We specify a prefix of PKG_CONFIG_PATH before we run npm install to include our new static libraries into the local path. NPM will compile the OpenCV module using the statically compiled version of the OpenCV library we built above.

    Step 6: Testing our newly compiled Native Module

    The NodeJS OpenCV module includes some sample facial detection code that we can use to validate that the module has been built correctly:

    $ cd ~/opencv_example
    $ cd node_modules/opencv/examples/
    $ mkdir tmp
    $ node face-detection.js
    Image saved to ./tmp/face-detection.png
    $ ls tmp
    face-detection.png
    

    Now if we look at the newly created file we should see a red circle placed around Mona Lisa’s head:

    Mona Lisa, after facial recognition analysis

    Step 7: Deploying to Lambda

    You’ve now successfully compiled a native NodeJS module and tested it using OpenCV’s test code. You can remove any test files and test output, write a real Lambda function, ZIP up your directory as before and deploy it to Lambda. If you’ve been following along with our actual example above, you could try this out in Lambda by using OpenCV to do facial recognition on images added to an Amazon S3 bucket.

    -Bryan and Tim

  • Understanding Container Reuse in AWS Lambda

    by Tim Wagner | on | in AWS Lambda |

    Tim Wagner Tim Wagner, AWS Lambda

    AWS Lambda functions execute in a container (sandbox) that isolates them from other functions and provides the resources, such as memory, specified in the function’s configuration. In this article we discuss how Lambda creates and reuses these sandboxes, and the impact of those policies on the programming model.

    Startup

    The first time a function executes after being created or having its code or resource configuration updated, a new container with the appropriate resources will be created to execute it, and the code for the function will be loaded into the container. In nodejs, initialization code is executed once per container creation, before the handler is called for the first time.

    In nodejs, a Lambda function can complete in one of three ways:

    1. Timeout. The user-specified duration has been reached. Execution will be summarily halted regardless of what the code is currently doing.
    2. Controlled termination. One of the callbacks (which need not be the original handler entry point) invokes context.done() and then finishes its own execution. Execution will terminate regardless of what the other callbacks (if any) are doing.
    3. Default termination. If all callbacks have finished (even if none have called context.done()), the function will also end. If there is no call to context.done(), you’ll see the message “Process exited before completing request” in the log (in this case, it really means ‘exited without having called context.done()’).

    There’s also effectively a fourth way to exit – by crashing or calling process.exit(). For example, if you include a binary library with a bug and it segfaults, you’ll effectively terminate execution of that container.

    Since context.done plays an important role here, a quick reminder of how to use it: The first argument should be null to indicate a successful outcome of the function (undefined is treated similarly). Any other value will be interpreted as an error result. The stringified representation of non-null values is automatically logged to the AWS CloudWatch Log stream. An error result may trigger Lambda to retry the function; see the S3 bucket notification and registerEventSource documentation for more information on retry semantics and the checkpointing of ordered event sources, such as Amazon DynamoDB Streams. The second argument to done() is an optional message string; if present, it will be displayed in the console for test invocations below the log output. (The message argument can be used for both success and error cases.)

    For those encountering nodejs for the first time in Lambda, a common error is forgetting that callbacks execute asynchronously and calling context.done() in the original handler when you really meant to wait for another callback (such as an S3.PUT operation) to complete, forcing the function to terminate with its work incomplete. There are also some excellent nodejs packages that provide fine-grained control over callback patterns, including synchronization and ordering mechanisms, to make callback choreography easier, and we’ll explore using some of them in Lambda in a future article.

    Round 2

    Let’s say your function finishes, and some time passes, then you call it again. Lambda may create a new container all over again, in which case the experience is just as described above. This will be the case for certain if you change your code.

    However, if you haven’t changed the code and not too much time has gone by, Lambda may reuse the previous container. This offers some performance advantages to both parties: Lambda gets to skip the nodejs language initialization, and you get to skip initialization in your code. Files that you wrote to /tmp last time around will still be there if the sandbox gets reused.

    Remember, you can’t depend on a container being reused, since it’s Lambda’s prerogative to create a new one instead.

    The Freeze/Thaw Cycle

    We’ve talked about what happens in the original nodejs process that represents your Lambda function, but what if you spawned background threads or other processes? Outside of nodejs, Lambda doesn’t look at what else you might have done (or still be doing) to decide when to finish execution. If you need to wait for additional work to complete, you should represent that in nodejs with a callback (one that doesn’t call context.done() until the background job is finished). But let’s say you have a background process running when the function finishes – what happens to it if the container is reused? In this case, Lambda will actually “freeze” the process and thaw it out the next time you call the function (but *only* if the container is reused, which isn’t a guarantee). So in the reuse case, your background processes will still be there, but they won’t have been executing while you were away. This can be really convenient if you use them as companion processes, since it avoids the overhead of recreating them (in the same way that Lambda avoids the overhead of recreating the nodejs process itself when it reuses a sandbox). In the future we’ll extend the duration limit of Lambda functions beyond 60 seconds, allowing you to do long-running jobs when your intent really is to keep things running.

    Feedback

    Lambda’s still in preview, and one of our goals for preview is to get feedback from users on the programming model and APIs. Let us know how we’re doing and any ideas you have for making Lambda easier to use!

     

    -Tim

  • Compute content at re:Invent 2014

    by Tim Wagner | on | in Amazon EC2, Amazon ECS, AWS Lambda |

    Tim Wagner Tim Wagner, AWS Lambda

    AWS re:Invent 2014 Recap

    The 2014 re:Invent conference was an exciting venue for us to launch new compute-related services and features, including the Amazon EC2 Container Service, which supports Docker containers and lets you easily run distributed applications on a managed cluster of EC2 instances, and AWS Lambda, a new compute service that runs your code in response to events and manages all the compute resources for you. With over 13,000 attendees, 400 speakers, and more than 250 sessions, it was tough to find time to sleep! If you missed the conference or couldn’t attend some of the compute-related sessions, here’s a list of compute-related videos and slideshare links to get caught up:

    Amazon EC2 Container Service (ECS) Launch Announcement, Werner Vogels keynote

    In this segment of the Day 2 keynote, Werner announces the new Amazon ECS service and describes how Docker and containers are changing how we think about the composition of distributed applications.

     

    Breakout session: Amazon EC2 Container Service in Action, Deepak Singh APP313

    Slides

    Deepak and Dan provide an overview of the new Amazon ECS service and its capabilities. Container technology, particularly Docker, is all the rage these days. At AWS, our customers have been running Linux containers at scale for several years, and we are increasingly seeing customers adopt Docker, especially as they build loosely coupled distributed applications. However, to do so they have to run their own cluster management solutions, deal with configuration management, and manage their containers and associated metadata. We believe that those capabilities should be a core building block technology, just like EC2. In this presentation, Deepak and Dan announce the preview of Amazon EC2 Container Service, a new AWS service that makes is easy to run and manage Docker-enabled distributed applications using powerful APIs that allow you to launch and stop containers, get complete cluster state information, and manage linked containers. They discuss the rationale for building the EC2 Container Service, some of the core concepts, and walk you through how you can use the service for your applications.

     

    AWS Lambda Launch Announcement, Werner Vogels keynote

    In this segment of the Day 2 keynote, Werner announces the new AWS Lambda service and discusses why AWS is embracing events and event-driven compute as a way to more rapidly construct distributed applications and enhance cloud computing.

     

    Breakout session: Getting Started with AWS Lambda, Tim Wagner MBL202

    Slides

    AWS Lambda is a new compute service that runs your code in response to events and automatically manages compute resources for you. In this session, we describe what you need to get started quickly, including a review of key features, a live demonstration, how to use AWS Lambda with Amazon S3 event notifications and Amazon DynamoDB streams, and tips on getting the most out of Lambda functions.

     

    Amazon EC2 Instances Deep Dive

    Slides

    Amazon Elastic Compute Cloud (Amazon EC2) provides a broad selection of instance types to accommodate a diverse mix of workloads. In this technical session, John and Anthony provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. They dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. They also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.

     

    State of the Union: Amazon Compute Services

    Slides

    In this spotlight talk, Peter De Santis, Vice President of Amazon Compute Services, and Matt Garman, Vice President of Amazon EC2 share a ”behind the scenes” look at the evolution of compute at AWS. You hear about the drivers behind the innovations we’ve introduced, and learn how we’ve scaled our compute services to meet dramatic usage growth.

     

    Lots of exciting announcements, and more to come in 2015! Don’t forget to send us feedback on topics you’d like to hear more about.

    -Tim

  • Welcome to the AWS Compute Blog

    by Tim Wagner | on | in Amazon EC2, Amazon ECS, AWS Lambda |
    Tim Wagner, AWS Lambda GM and Deepak Singh, Amazon ECS Sr. Manager

     

    Welcome to the AWS Compute blog! This blog covers compute services offered by AWS, including Amazon EC2, Amazon ECS, and AWS Lambda, with a focus on new trends in cloud computing. In it you’ll find

    • Announcements and discussions of new features
    • Deep dives on technical topics
    • Best practices for using and optimizing AWS compute services
    • Code samples, tips, and tricks

    You’ll see content from many of our team members, including developers from the EC2, ECS, and Lambda teams.

    We hope you’ll come back often to visit. If you’d like to see any specific topics covered, please let us know.

    Deepak and Tim