Category: Amazon EC2

New T2.Xlarge and T2.2Xlarge Instances

AWS customers love the cost-effective, burst-based model that they get when they use T2 instances. These customers use T2 instances to run general purpose workloads such as web servers, development environments, continuous integration servers, test environments, and small databases. These instances provide a generous amount of baseline performance and the ability to automatically and transparently scale up to full-core processing power on an as-needed basis (refer back to New Low Cost EC2 Instances with Burstable Performance if this is news to you).

Today we are adding two new larger T2 instance sizes – t2.xlarge with 16 GiB of memory and t2.2xlarge with 32 GiB of memory. These new sizes enable customers to benefit from price and performance of the T2 burst model for applications with larger resource requirements  (this is the third time that we have expanded the range of t2 instances; we added t2.large instances last June and t2.nano instances last December).

Here are the specs for all of the sizes of T2 instances (the prices reflect the most recent EC2 Price Reduction):

Name vCPUs Baseline Performance Platform Memory (GiB) CPU Credits / Hour Price / Hour
t2.nano 1 5% 32-bit or 64-bit 0.5 3  $0.0059
t2.micro 1 10% 32-bit or 64-bit 1 6  $0.012
t2.small 1 20% 32-bit or 64-bit 2 12  $0.023
t2.medium 2 40% 32-bit or 64-bit 4 24  $0.047
t2.large 2 60% 64-bit 8 36  $0.094
t2.xlarge 4 90% 64-bit 16 54  $0.188
t2.2xlarge 8 135% 64-bit 32 81  $0.376

Here are a couple of ways that you might be able to move existing workloads to the new instances:

  • t2.large workloads can scale up to t2.xlarge or t2.2xlarge in order to gain access to more memory.
  • Intermittent c4.2xlarge workloads can move to t2.xlarge at a significant cost reduction, with similar burst performance.
  • Intermittent m4.xlarge workloads can move to t2.xlarge at s slight cost reduction, and higher burst performance.

The new instances are available today as On-Demand & Reserved Instances in all AWS regions.


Update: We have a webinar coming up on January 19th, where you can learn more. Sign up here.

Human Longevity, Inc. – Changing Medicine Through Genomics Research

Human Longevity, Inc. (HLI) is at the forefront of genomics research and wants to build the world’s largest database of human genomes along with related phenotype and clinical data, all in support of preventive healthcare. In today’s guest post, Yaron Turpaz,  Bryan Coon, and Ashley Van Zeeland talk about how they are using AWS to store the massive amount of data that is being generated as part of this effort to revolutionize medicine.


When Human Longevity, Inc. launched in 2013, our founders recognized the challenges ahead. A genome contains all the information needed to build and maintain an organism; in humans, a copy of the entire genome, which contains more than three billion DNA base pairs, is contained in all cells that have a nucleus. Our goal is to sequence one million genomes and deliver that information—along with integrated health records and disease-risk models—to researchers and physicians. They, in turn, can interpret the data to provide targeted, personalized health plans and identify the optimal treatment for cancer and other serious health risks far earlier than has been possible in the past. The intent is to transform medicine by fostering preventive healthcare and risk prevention in place of the traditional “sick care” model, when people wind up seeing their doctors only after symptoms manifest.

Our work in developing and applying large-scale computing and machine learning to genomics research entails the collection, analysis, and storage of immense amounts of data from DNA-sequencing technology provided by companies like Illumina. Raw data from a single genome consumes about 100 gigabytes; that number increases as we align the genomic information with annotation and phenotype sources and analyze it for health insights.

From the beginning, we knew our choice of compute and storage technology would have a direct impact on the success of the company. Using the cloud was clearly the best option. We’re experts in genomics, and don’t want to spend resources building and maintaining an IT infrastructure. We chose to go all in on AWS for the breadth of the platform, the critical scalability we need, and the expertise AWS has developed in big data. We also saw that the pace of innovation at AWS—and its deliberate strategy of keeping costs as low as possible for customers—would be critical in enabling our vision.

Leveraging the Range of AWS Services

 Spectral karyotype analysis / Image courtesy of Human Longevity, Inc.

Spectral karyotype analysis / Image courtesy of Human Longevity, Inc.

Today, we’re using a broad range of AWS services for all kinds of compute and storage tasks. For example, the HLI Knowledgebase leverages a distributed system infrastructure comprised of Amazon S3 storage and a large number of Amazon EC2 nodes. This helps us achieve resource isolation, scalability, speed of provisioning, and near real-time response time for our petabyte-scale database queries and dynamic cohort builder. The flexibility of AWS services makes it possible for our customized Amazon Machine Images and pre-built, BTRFS-partitioned Amazon EBS volumes to achieve turn-up time in seconds instead of minutes. We use Amazon EMR for executing Spark queries against our data lake at the scale we need. AWS Lambda is a fantastic tool for hooking into Amazon S3 events and communicating with apps, allowing us to simply drop in code with the business logic already taken care of. We use Auto Scaling based on demand, and AWS OpsWorks for managing a Docker pipeline.

We also leverage the cost controls provided by Amazon EC2 Spot and Reserved Instance types. When we first started, we used on-demand instances, but the costs started to grow significantly. With Spot and Reserved Instances, we can allocate compute resources based on specific needs and workflows. The flexibility of AWS services enables us to make extensive use of dockerized containers through the resource-management services provided by Apache Mesos. Hundreds of dynamic Amazon EC2 nodes in both our persistent and spot abstraction layers are dynamically adjusted to scale up or down based on usage demand and the latest AWS pricing information. We achieve substantial savings by sharing this dynamically scaled compute cluster with our Knowledgebase service and the internal genomic and oncology computation pipelines. This flexibility gives us the compute power we need while keeping costs down. We estimate these choices have helped us reduce our compute costs by up to 50 percent from the on-demand model.

We’ve also worked with AWS Professional Services to address a particularly hard data-storage challenge. We have genomics data in hundreds of Amazon S3 buckets, many of them in the petabyte range and containing billions of objects. Within these collections are millions of objects that are unused, or used once or twice and never to be used again. It can be overwhelming to sift through these billions of objects in search of one in particular. It presents an additional challenge when trying to identify what files or file types are candidates for the Amazon S3-Infrequent Access storage class. Professional Services helped us with a solution for indexing Amazon S3 objects that saves us time and money.

Moving Faster at Lower Cost
Our decision to use AWS came at the right time, occurring at the inflection point of two significant technologies: gene sequencing and cloud computing. Not long ago, it took a full year and cost about $100 million to sequence a single genome. Today we can sequence a genome in about three days for a few thousand dollars. This dramatic improvement in speed and lower cost, along with rapidly advancing visualization and analytics tools, allows us to collect and analyze vast amounts of data in close to real time. Users can take that data and test a hypothesis on a disease in a matter of days or hours, compared to months or years. That ultimately benefits patients.

Our business includes HLI Health Nucleus, a genomics-powered clinical research program that uses whole-genome sequence analysis, advanced clinical imaging, machine learning, and curated personal health information to deliver the most complete picture of individual health. We believe this will dramatically enhance the practice of medicine as physicians identify, treat, and prevent diseases, allowing their patients to live longer, healthier lives.

— Yaron Turpaz (Chief Information Officer), Bryan Coon (Head of Enterprise Services), and Ashley Van Zeeland (Chief Technology Officer).

Learn More
Learn more about how AWS supports genomics in the cloud, and see how genomics innovator Illumina uses AWS for accelerated, cost-effective gene sequencing.


EC2 Price Reduction (C4, M4, and T2 Instances)

I am happy to be able to announce that an EC2 price reduction will go in to effect on December 1, 2016, just in time to make your holiday season just a little bit more cheerful! Our engineering investments, coupled with our scale and our time-tested ability to manage our capacity, allow us to identify and pass on the cost savings to you.

We are reducing the On-Demand, Reserved Instance (Standard and Convertible) and Dedicated Host prices for C4, M4, and T2 instances by up to 25%, depending on region and platform (Linux, RHEL, SUSE, Windows, and so forth). Price cuts apply across all AWS Regions. For example:

  • C4 – Reductions of up to 5% in US East (Northern Virginia) and EU (Ireland) and 20% in Asia Pacific (Mumbai) and Asia Pacific (Singapore).
  • M4  – Reductions of up to 10% in US East (Northern Virginia), EU (Ireland), and EU (Frankfurt) and 25% in Asia Pacific (Singapore).
  • T2 – Reductions of up to 10% in US East (Northern Virginia) and 25% in Asia Pacific (Singapore).

As always, you do not need to take any action in order to benefit from the reduction in On-Demand prices. If you are using billing alerts or our newly revised budget feature, you may want to consider revising your thresholds downward as appropriate.


PS – By my count, this is our 53rd price reduction.

New – Burst Balance Metric for EC2’s General Purpose SSD (gp2) Volumes

Many AWS customers are getting great results with the General Purpose SSD (gp2) EBS volumes that we launched in mid-2014 (see New SSD-Backed Elastic Block Storage for more information).  If you’re unsure of which volume type to use for your workload, gp2 volumes are the best default choice because they offer balanced price/performance for a wide variety of database, dev and test, and boot volume workloads. One of the more interesting aspects of this volume type is the burst feature.

We designed gp2‘s burst feature to suit the I/O patterns of real world workloads we observed across our customer base. Our data scientists found that volume I/O is extremely bursty, spiking for short periods, with plenty of idle time between bursts. This unpredictable and bursty nature of traffic is why we designed the gp2 burst-bucket to allow even the smallest of volumes to burst up to 3000 IOPS and to replenish their burst bucket during idle times or when performing low levels of I/O. The burst-bucket design allows us to provide consistent  and predictable performance for all gp2 users. In practice, very few gp2 volumes ever completely deplete their burst-bucket, and now customers can track their usage patterns and adjust accordingly.

We’ve written extensively about performance optimization across different volume types and the differences between benchmarking and real-world workloads (see I/O Characteristics for more information). As I described in my original post, burst credits accumulate at a rate of 3 per configured GB per second, and each one pays for one read or one write. Each volume can accumulate up to 5.4 million credits, and they can be spent at up to 3,000 per second per volume. To get started, you simply create gp2 volumes of the desired size, launch your application, and your I/O to the volume will proceed as rapidly and efficiently as possible.

New Metric
Effective today, we are making the Burst Balance metric available for each General Purpose (SSD) volume. You can observe this metric in the CloudWatch Console and you can set up an alarm that will be triggered if the balance becomes too low. The metric is expressed as percentage; 100% means that the volume has accumulated the maximum number of credits.

I launched a c4.8xlarge instance and attached a 100 GB volume to it:

Then I created an alarm to let me know if the volume’s burst balance went below 40% (in a real-world scenario you might want to set this considerably lower, but I was impatient and it takes a fair amount of time to drain the balance):

I confirmed my SNS subscription, and then ran fio to generate a load:

$ sudo fio --filename=/dev/sdb --rw=randread --bs=16k --runtime=2400 --time_based=1 \
  --iodepth=32 --ioengine=libaio --direct=1  --name=gp2-16kb-burst-bucket-test

Then I watched as the balance declined:

As expected, I received a notification email:

In a production scenario, I could choose to increase the size of the volume, fine-tune my application’s I/O behavior, or simply note for the record that I was making good use of the burst-bucket.

After the end of the test, I had lunch and watched the burst balance increase (I used the updated CloudWatch Console this time):

If you’re one of the few customers where your burst-bucket is depleting more than you’d like, you can either increase the size of your gp2 volume for more performance or transition to a Provisioned IOPS SSD (io1) volume, which delivers consistent, provisioned performance 99.9% of the time.

Available Now
This feature is available now and you can start using it today in all AWS Commercial Regions at no charge. The usual charges for CloudWatch Alarms will apply.



PS – If you are a developer, development manager, or a product manager and would like to build systems like this, check out the EBS Jobs page.

New Amazon Linux Container Image for Cloud and On-Premises Workloads

The Amazon Linux AMI is designed to provide a stable, secure, high performance execution environment for applications running on EC2. With limited remote access (no root login and mandatory SSH key pairs) and a very small number of non-critical packages installed, the AMI has a very respectable security profile.

Many of our customers have asked us to make this Linux image available for use on-premises, often as part of their development and testing workloads.

Today I am happy to announce that we are making the Amazon Linux Container Image available for cloud and on-premises use. The image is available from the EC2 Container Registry (read Pulling an Image to learn how to access it). It is built from the same source code and packages as the AMI and will give you a smooth path to container adoption. You can use it as-is or as the basis for your own images.

In order to test this out, I launched a fresh EC2 instance, installed Docker, and then pulled and ran the new image. Then I installed cowsay and lolcat (and the dependencies) and created the image above.

To learn more about this image, read Amazon Linux Container Image.

You can also use the image with Amazon EC2 Container Service. To learn more, read Using Amazon ECR Images with Amazon ECS.



New Utility – Opt-in to Longer Resource IDs Across All Regions

Early this year I announced that Longer EC2 Resource IDs are Now Available, and kicked off the start of a transition period that will last until early December 2016. During the transition period, you can opt in to the new resource format on a region-by-region, user-by-user basis. At the conclusion of the transition period, all newly created resources will be assigned 17-character identifiers. Here are some important dates for your calendar:

  • November – Beginning on November 1st, you can use the describe-id-format command to check on the cutover deadline for the regions that are of interest to you.
  • December – Between December 5th and December 16th, we will be setting individual AWS Regions to use 17-character identifiers by default.

In order to help you to ensure that your code and your tools can handle the new format, I’d like to personally encourage you to opt in as soon as possible!

We’ve launched a new longer-ID-converter tool that will allow you to opt in, opt out, or simply check the status. If you have already installed the AWS Command Line Interface (CLI), you can simply download, the script, make it executable, and then run it:

$ wget
$ chmod +x

Here are some of the things that you can do.

Check the status of your account:

$ ./ --status

Convert account, IAM Roles, and IAM Users to long IDs:

$ ./

Revert to short IDs:

$ ./ --revert

Convert the current User/Role:

$ ./ --convertself

For more information on this utility, check out the README file. For more information on the move to longer resource IDs, consult the EC2 FAQ.


Run Windows Server 2016 on Amazon EC2

You can now run Windows Server 2016 on Amazon Elastic Compute Cloud (EC2). This version of Windows Server is packed with new features including support for Docker and Windows containers.  We are making it available in all AWS regions today, in four distinct forms:

  • Windows Server 2016 Datacenter with Desktop Experience – The mainstream version of Windows Server, designed with security and scalability in mind, with support for both traditional and cloud-native applications. To learn a lot more about Windows Server 2016, download The Ultimate Guide to Windows Server 2016 (registration required).
  • Windows Server 2016 Nano Server -A cloud-native, minimal install that takes up a modest amount of disk space and boots more swiftly than the Datacenter version, while leaving more system resources (memory, storage, and CPU) available to run apps and services. You can read Moving to Nano Server to learn how to migrate your code and your applications. Nano Server does not include a desktop UI so you’ll need to administer it remotely using PowerShell or WMI. To learn how to do this, read Connecting to a Windows Server 2016 Nano Server Instance.
  • Windows Server 2016 with Containers – Windows Server 2016 with Windows containers and Docker already installed.
  • Windows Server 2016 with SQL Server 2016 – Windows Server 2016 with SQL Server 2016 already installed.

Here are a couple of things to keep in mind with respect to Windows Server 2016 on EC2:

  • Memory – Microsoft recommends a minimum of 2 GiB of memory for Windows Server. Review the EC2 Instance Types to find the type that is the best fit for your application.
  • Pricing – The standard Windows EC2 Pricing applies; you can launch On-Demand and Spot Instances, and you can purchase Reserved Instances.
  • Licensing – You can (subject to your licensing terms with Microsoft) bring your own license to AWS.
  • SSM Agent – An upgraded version of our SSM Agent is now used in place of EC2Config. Read the User Guide to learn more.

Containers in Action
I launched the Windows Server 2016 with Containers AMI and logged in to it in the usual way:

Then I opened up PowerShell and ran the command docker run microsoft/sample-dotnet . Docker downloaded the image, and launched it. Here’s what I saw:

We plan to add Windows container support to Amazon ECS by the end of 2016. You can register here to learn more.

Get Started Today
You can get started with Windows Server 2016 on EC2 today. Try it out and let me know what you think!


X1 Instance Update – X1.16xlarge + More Regions

Earlier this year we made the x1.32xlarge instance available. With nearly 2 TiB of memory, this instance type is a great fit for memory-intensive big data, caching, and analytics workloads. Our customers are running the SAP HANA in-memory database, large-scale Apache Spark and Hadoop jobs, and many types of high performance computing (HPC) workloads.

Today we are making two updates to the X1 instance type:

  • New Instance Size – The new x1.16xlarge instance size provides another option for running smaller-footprint workloads.
  • New Regions – The X1 instances are now available in three additional regions, bringing the total to ten.

New Instance Size
Here are the specifications for the new x1.16xlarge:

  • Processor: 2 x Intel™ Xeon E7 8880 v3 (Haswell) running at 2.3 GHz – 32 cores / 64 vCPUs.
  • Memory: 976 GiB with Single Device Data Correction (SDDC+1).
  • Instance Storage: 1,920 GB SSD.
  • Network Bandwidth: 10 Gbps.
  • Dedicated EBS Bandwidth: 5 Gbps (EBS Optimized at no additional cost).

Like the x1.32xlarge, this instance supports Turbo Boost 2.0 (up to 3.1 GHz), AVX 2.0, AES-NI, and TSX-NI.  They are available as On-Demand Instances, Spot Instances, or Dedicated Instances; you can also purchase Reserved Instances and Dedicated Host Reservations.

New Regions
Both sizes of X1 instances are available in the following regions:

  • US East (Northern Virginia)
  • US West (Oregon)
  • EU (Ireland)
  • EU (Frankfurt)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Asia Pacific (Mumbai) – New
  • AWS GovCloud (US) – New
  • Asia Pacific (Seoul) – New



New – CloudWatch Plugin for collectd

You have had the power to store your own business, application, and system metrics in Amazon CloudWatch for quite some time (see New – Custom Metrics for Amazon CloudWatch to learn more).  As I wrote way back in 2011 when I introduced this feature, “You can view graphs, set alarms, and initiate automated actions based on these metrics, just as you can for the metrics that CloudWatch already stores for your AWS resources.”

Today we are simplifying the process of collecting statistics from your system and getting them in to CloudWatch with the introduction of a new CloudWatch plugin for collectd. By combining collectd‘s ability to gather many different types of statistics with the CloudWatch features for storage, display, alerting, and alarming, you can become better informed about the state and performance of your EC2 instances and your on-premises hardware and the applications running on them. The plugin is being released as an open source project and we are looking forward to your pull requests.

The collectd daemon is written in C for performance and portability. It supports over one hundred plugins, allowing you to collect statistics on Apache and Nginx web server performance, memory usage, uptime, and much more.

Installation and Configuration
I installed and configured collectd and the new plugin on an EC2 instance in order to see it in action.

To get started I created an IAM Policy with permission to write metrics data to CloudWatch:

Then I created an IAM Role that allows EC2 (and hence the collectd code running on my instance) to use my Policy:

If I was planning to use the plugin to collect statistics from my on-premises servers or if my EC2 instances were already running, I could have skipped these steps, and created an IAM user with the appropriate permissions instead. Had I done this, I would have had to put the user’s credentials on the servers or instances.

With the Policy and the Role in place, I launched an EC2 instance and selected the Role:

I logged in and installed collectd:

$ sudo yum -y install collectd

Then I fetched the plugin and the install script, made the script executable, and ran it:

$ chmod a+x
$ sudo ./

I answered a few questions and the setup ran without incident, starting up collectd after configuring it:

Installing dependencies ... OK
Installing python dependencies ... OK
Copying plugin tar file ... OK
Extracting plugin ... OK
Moving to collectd plugins directory ... OK
Copying CloudWatch plugin include file ... OK

Choose AWS region for published metrics:
  1. Automatic [us-east-1]
  2. Custom
Enter choice [1]: 1

Choose hostname for published metrics:
  1. EC2 instance id [i-057d2ed2260c3e251]
  2. Custom
Enter choice [1]: 1

Choose authentication method:
  1. IAM Role [Collectd_PutMetricData]
  2. IAM User
Enter choice [1]: 1

Choose how to install CloudWatch plugin in collectd:
  1. Do not modify existing collectd configuration
  2. Add plugin to the existing configuration
Enter choice [2]: 2
Plugin configuration written successfully.
Stopping collectd process ... NOT OK
Starting collectd process ... OK

With collectd running and the plugin installed and configured, the next step was to decide on the statistics of interest and configure the plugin to publish them to CloudWatch (note that there is a per-metric cost so this is an important step).

The file /opt/collectd-plugins/cloudwatch/config/blocked_metrics contains a list of metrics that have been collected but not published to CloudWatch:

$ cat /opt/collectd-plugins/cloudwatch/config/blocked_metrics
# This file is automatically generated - do not modify this file.
# Use this file to find metrics to be added to the whitelist file instead.

I was interested in memory consumption so I added one line to /opt/collectd-plugins/cloudwatch/config/whitelist.conf:


The collectd configuration file (/etc/collectd.conf) contains additional settings for collectd and the plugins. I did not need to make any changes to it.

I restarted collectd so that it would pick up the change:

$ sudo service collectd restart

I exercised my instance a bit in order to consume some memory, and then opened up the CloudWatch Console to locate and display my metrics:

This screenshot includes a preview of an upcoming enhancement to the CloudWatch Console; don’t worry if yours doesn’t look as cool (stay tuned for more information on this).

If I had been monitoring a production instance, I could have installed one or more of the collectd plugins. Here’s a list of what’s available on the Amazon Linux AMI:

$ sudo yum list | grep collectd
collectd.x86_64                        5.4.1-1.11.amzn1               @amzn-main
collectd-amqp.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-apache.x86_64                 5.4.1-1.11.amzn1               amzn-main
collectd-bind.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-curl.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-curl_xml.x86_64               5.4.1-1.11.amzn1               amzn-main
collectd-dbi.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-dns.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-email.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-generic-jmx.x86_64            5.4.1-1.11.amzn1               amzn-main
collectd-gmond.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-ipmi.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-iptables.x86_64               5.4.1-1.11.amzn1               amzn-main
collectd-ipvs.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-java.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-lvm.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-memcachec.x86_64              5.4.1-1.11.amzn1               amzn-main
collectd-mysql.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-netlink.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-nginx.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-notify_email.x86_64           5.4.1-1.11.amzn1               amzn-main
collectd-postgresql.x86_64             5.4.1-1.11.amzn1               amzn-main
collectd-rrdcached.x86_64              5.4.1-1.11.amzn1               amzn-main
collectd-rrdtool.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-snmp.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-varnish.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-web.x86_64                    5.4.1-1.11.amzn1               amzn-main

Things to Know
If you are using version 5.5 or newer of collectd, four metrics are now published by default:

  • df-root-percent_bytes-used – disk utilization
  • memory–percent-used – memory utilization
  • swap–percent-used – swap utilization
  • cpu–percent-active – cpu utilization

You can remove these from the whitelist.conf file if you don’t want them to be published.

The primary repositories for the Amazon Linux AMI, Ubuntu, RHEL, and CentOS currently provide older versions of collectd; please be aware of this change in the default behavior if you install from a custom repo or build from source.

Lots More
There’s quite a bit more than I had time to show you. You can  install more plugins and then configure whitelist.conf to publish even more metrics to CloudWatch. You can create CloudWatch Alarms, set up Custom Dashboards, and more.

To get started, visit AWS Labs on GitHub and download the CloudWatch plugin for collectd.



New P2 Instance Type for Amazon EC2 – Up to 16 GPUs

I like to watch long-term technology and business trends and watch as they shape the products and services that I get to use and to write about. As I was preparing to write today’s post, three such trends came to mind:

  • Moore’s Law – Coined in 1965, Moore’s Law postulates that the number of transistors on a chip doubles every year.
  • Mass Market / Mass Production – Because all of the technologies that we produce, use, and enjoy every day consume vast numbers of chips, there’s a huge market for them.
  • Specialization  – Due to the previous trend, even niche markets can be large enough to be addressed by purpose-built products.

As the industry pushes forward in accord with these trends, a couple of interesting challenges have surfaced over the past decade or so. Again, here’s a quick list (yes, I do think in bullet points):

  • Speed of Light – Even as transistor density increases, the speed of light imposes scaling limits (as computer pioneer Grace Hopper liked to point out, electricity can travel slightly less than 1 foot in a nanosecond).
  • Semiconductor Physics – Fundamental limits in the switching time (on/off) of a transistor ultimately determine the minimum achievable cycle time for a CPU.
  • Memory Bottlenecks – The well-known von Neumann Bottleneck imposes limits on the value of additional CPU power.

The GPU (Graphics Processing Unit) was born of these trends, and addresses many of the challenges! Processors have reached the upper bound on clock rates, but Moore’s Law gives designers more and more transistors to work with. Those transistors can be used to add more cache and more memory to a traditional architecture, but the von Neumann Bottleneck limits the value of doing so. On the other hand, we now have large markets for specialized hardware (gaming comes to mind as one of the early drivers for GPU consumption). Putting all of this together, the GPU scales out (more processors and parallel banks of memory) instead of up (faster processors and bottlenecked memory). Net-net: the GPU is an effective way to use lots of transistors to provide massive amounts of compute power!

With all of this as background, I would like to tell you about the newest EC2 instance type, the P2. These instances were designed to chew through tough, large-scale machine learning, deep learning, computational fluid dynamics (CFD), seismic analysis, molecular modeling, genomics, and computational finance workloads.

New P2 Instance Type
This new instance type incorporates up to 8 NVIDIA Tesla K80 Accelerators, each running a pair of NVIDIA GK210 GPUs. Each GPU provides 12 GiB of memory (accessible via 240 GB/second of memory bandwidth), and 2,496 parallel processing cores. They also include ECC memory protection, allowing them to fix single-bit errors and to detect double-bit errors. The combination of ECC memory protection and double precision floating point operations makes these instances a great fit for all of the workloads that I mentioned above.

Here are the instance specs:

Instance Name GPU Count vCPU Count Memory Parallel Processing Cores
GPU Memory
Network Performance
p2.xlarge 1 4 61 GiB 2,496 12 GiB High
p2.8xlarge 8 32 488 GiB 19,968 96 GiB 10 Gigabit
p2.16xlarge 16 64 732 GiB 39,936 192 GiB 20 Gigabit

All of the instances are powered by an AWS-Specific version of Intel’s Broadwell processor, running at 2.7 GHz. The p2.16xlarge gives you control over C-states and P-states, and can turbo boost up to 3.0 GHz when running on 1 or 2 cores.

The GPUs support CUDA 7.5 and above, OpenCL 1.2, and the GPU Compute APIs. The GPUs on the p2.8xlarge and the p2.16xlarge are connected via a common PCI fabric. This allows for low-latency, peer to peer GPU to GPU transfers.

All of the instances make use of our new Enhanced Network Adapter (ENA – read Elastic Network Adapter – High Performance Network Interface for Amazon EC2 to learn more) and can, per the table above, support up to 20 Gbps of low-latency networking when used within a Placement Group.

Having a powerful multi-vCPU processor and multiple, well-connected GPUs on a single instance, along with low-latency access to other instances with the same features creates a very impressive hierarchy for scale-out processing:

  • One vCPU
  • Multiple vCPUs
  • One GPU
  • Multiple GPUs in an instance
  • Multiple GPUs in multiple instances within a Placement Group

P2 instances are VPC only, require the use of 64-bit, HVM-style, EBS-backed AMIs, and you can launch them today in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) regions as On-Demand Instances, Spot Instances, Reserved Instances, or Dedicated Hosts.

Here’s how I installed the NVIDIA drivers and the CUDA toolkit on my P2 instance, after first creating, formatting, attaching, and mounting (to /ebs) an EBS volume that had enough room for the CUDA toolkit and the associated samples (10 GiB is more than enough):

$ cd /ebs
$ sudo yum update -y
$ sudo yum groupinstall -y "Development tools"
$ sudo yum install -y kernel-devel-`uname -r`
$ wget
$ wget
$ chmod +x
$ sudo ./
$ chmod +x
$ sudo ./   # Don't install driver, just install CUDA and sample
$ sudo nvidia-smi -pm 1
$ sudo nvidia-smi -acp 0
$ sudo nvidia-smi --auto-boost-permission=0
$ sudo nvidia-smi -ac 2505,875

Note that and are interactive programs; you need to accept the license agreements, choose some options, and enter some paths. Here’s how I set up the CUDA toolkit and the samples when I ran

P2 and OpenCL in Action
With everything set up, I took this Gist and compiled it on a p2.8xlarge instance:

[ec2-user@ip-10-0-0-242 ~]$ gcc test.c -I /usr/local/cuda/include/ -L /usr/local/cuda-7.5/lib64/ -lOpenCL -o test

Here’s what it reported:

[ec2-user@ip-10-0-0-242 ~]$ ./test
1. Device: Tesla K80
 1.1 Hardware version: OpenCL 1.2 CUDA
 1.2 Software version: 352.99
 1.3 OpenCL C version: OpenCL C 1.2
 1.4 Parallel compute units: 13
2. Device: Tesla K80
 2.1 Hardware version: OpenCL 1.2 CUDA
 2.2 Software version: 352.99
 2.3 OpenCL C version: OpenCL C 1.2
 2.4 Parallel compute units: 13
3. Device: Tesla K80
 3.1 Hardware version: OpenCL 1.2 CUDA
 3.2 Software version: 352.99
 3.3 OpenCL C version: OpenCL C 1.2
 3.4 Parallel compute units: 13
4. Device: Tesla K80
 4.1 Hardware version: OpenCL 1.2 CUDA
 4.2 Software version: 352.99
 4.3 OpenCL C version: OpenCL C 1.2
 4.4 Parallel compute units: 13
5. Device: Tesla K80
 5.1 Hardware version: OpenCL 1.2 CUDA
 5.2 Software version: 352.99
 5.3 OpenCL C version: OpenCL C 1.2
 5.4 Parallel compute units: 13
6. Device: Tesla K80
 6.1 Hardware version: OpenCL 1.2 CUDA
 6.2 Software version: 352.99
 6.3 OpenCL C version: OpenCL C 1.2
 6.4 Parallel compute units: 13
7. Device: Tesla K80
 7.1 Hardware version: OpenCL 1.2 CUDA
 7.2 Software version: 352.99
 7.3 OpenCL C version: OpenCL C 1.2
 7.4 Parallel compute units: 13
8. Device: Tesla K80
 8.1 Hardware version: OpenCL 1.2 CUDA
 8.2 Software version: 352.99
 8.3 OpenCL C version: OpenCL C 1.2
 8.4 Parallel compute units: 13

As you can see, I have a ridiculous amount of compute power available at my fingertips!

New Deep Learning AMI
As I said at the beginning, these instances are a great fit for machine learning, deep learning, computational fluid dynamics (CFD), seismic analysis, molecular modeling, genomics, and computational finance workloads.

In order to help you to make great use of one or more P2 instances, we are launching a  Deep Learning AMI today. Deep learning has the potential to generate predictions (also known as scores or inferences) that are more reliable than those produced by less sophisticated machine learning, at the cost of a most complex and more computationally intensive training process. Fortunately, the newest generations of deep learning tools are able to distribute the training work across multiple GPUs on a single instance as well as across multiple instances each containing multiple GPUs.

The new AMI contains the following frameworks, each installed, configured, and tested against the popular MNIST database:

MXNet – This is a flexible, portable, and efficient library for deep learning. It supports declarative and imperative programming models across a wide variety of programming languages including C++, Python, R, Scala, Julia, Matlab, and JavaScript.

Caffe – This deep learning framework was designed with  expression, speed, and modularity in mind. It was developed at the Berkeley Vision and Learning Center (BVLC) with assistance from many community contributors.

Theano – This Python library allows you define, optimize, and evaluate mathematical expressions that involve multi-dimensional arrays.

TensorFlow – This is an open source library for numerical calculation using data flow graphs (each node in the graph represents a mathematical operation; each edge represents multidimensional data communicated between them).

Torch – This is a GPU-oriented scientific computing framework with support for machine learning algorithms, all accessible via LuaJIT.

Consult the README file in ~ec2-user/src to learn more about these frameworks.

You may also find the following AMIs to be of interest: