Category: Amazon EC2


New Amazon EC2 Feature: Boot from Elastic Block Store

by Jeff Barr | on | in Amazon EC2 |

You can now launch Amazon EC2 instances from an AMI backed by Amazon EBS (Elastic Block Store). This new functionality enables you to launch an instance with an Amazon EBS volume that serves as the root device.

This new feature brings a number of important performance and operational benefits and also enables some really powerful new features:

  • Headroom – The root device (and hence the Amazon Machine Image or AMI) can now occupy up to 1 TB. You can now create more richly featured AMIs, installing additional tools, libraries, applications, and even reference data ahead of time.
  • Flexibility – Each AMI can now reference one or more EBS snapshots. An EBS volume will be created from each snapshot and then attached to the new instance before it begins to boot. For instance, you could attach reference data such as a Public Data Set to each new instance.
  • Performance – Instances will launch more quickly when they are booted from an EBS snapshot.
  • Control – Instances can be stopped and then restarted at a later time. The contents of the volume are, of course, preserved while the instance is stopped, so you get the benefits of a persistent root device without being tied down to a particular piece of compute hardware.

Let’s compare and contrast the original S3-based boot process and the new EBS-based process. Here’s what happens when you boot from an AMI that references an image residing in S3:

  1. EC2 locates and allocates a suitable piece of compute hardware (the instance).
  2. The S3 image is copied to the root device of the instance.
  3. The instance is booted.

Now, here’s what happens when you boot from an AMI that references an image residing in EBS:

  1. EC2 locates and allocates a suitable piece of compute hardware (the instance).
  2. An EBS volume is created for each EBS snapshot referenced by the AMI. The first snapshot is mandatory and denotes the root device; the others are optional and denote additional volumes to be created from other snapshots.
  3. The instance is booted. Unlike the S3-based instances, the ephemeral (local) disks are not mounted automatically. If you need to use these disks for temporary storage, you can request that they be mounted using an option to the RunInstances function. The usual charges for I/O requests apply to an EBS root device; you should consider using ephemeral storage volumes for applications that depend on the local file system for short-term storage. 

Up until this point the two processes are quite similar. However, the new model allows the instance to be stopped (shut down cleanly and the EBS volumes preserved) at any point and then rebooted later. Here’s the process:

  1. The shut down process is initiated and the operating system takes itself out of service.
  2. The EBS volumes are detached from the compute hardware.
  3. The compute hardware associated with the instance is freed and returned to the resource pool.
  4. The state of the instance is set to “stopped.”

At this point the instance neither consumes nor possesses any compute hardware and is not accruing any compute hours. While the instance is stopped, the new ModifyInstanceAttribute function can be used to change instance attributes such as the instance type (small, medium, large, and so forth), the kernel, the user data, and so forth. The instance’s Id remains valid while the instance is stopped, and can be used as the target of a start request. Here’s what happens then:

  1. EC2 locates and allocates a suitable piece of compute hardware (the instance).
  2. The EBS volumes are attached to the instance.
  3. The instance is booted.

When the instance is finally terminated, the EBS volumes will be deleted unless the deleteOnTermination flag associated with the volume was cleared prior to the termination request.

We made a number of other additions and improvements along the way including a disableApiTermination flag on each instance to protect your instances from accidental shutdowns, a new Description field for each AMI, and a simplified AMI creation process (one that works for both Linux and Windows) based on the new CreateImage function.

Detailed information about all of the new features can be found in the EC2 documentation. You should also take a look at the new Boot from EBS Feature Guide. This handy document includes tutorials on Running an instance backed by Amazon EBS, stopping and starting an instance, and bundling an instance backed by Amazon EBS. It also covers some advanced options and addresses some frequently asked questions about this powerful new feature.

 

I recently spent some time using this new feature and I had an enjoyable (and very productive) time doing so.

I built a scalable, distributed ray tracing system around the venerable POV-Ray program. I was able to test and fine-tune the startup behavior of my EC2 instance without the need to create a new AMI for each go-round. Once I had it working as desired, I created the AMI and then enjoyed quicker boot times as I brought additional EC2 instances online and into my “farm” of ray-tracing instances.

I’ll be publishing an article with full details on what I built in the near future, so stay tuned!

— Jeff;

IBM Tivoli Now Available on Amazon EC2

by Jeff Barr | on | in Amazon EC2, Enterprise |

Adoption of the AWS Cloud by mainstream ISVs is underway as you read this. There are numerous posts about IBMs work to bring their product line into the AWS environment, and todays is no exception. IBM Tivoli monitoring is now available as an Amazon Machine Image (AMI) that runs as a virtual computer in the AWS environment. Its one more example of enterprise-class applications from household-name ISVs that run in the Amazon Cloud.

And it’s simple to Use – IBM provides self-install scripts for data collection agents, self-help guides and maintains infrastructure for delivery of Tivoli software Because there is no hardware or software to purchase and because the hourly price for Tivoli on EC2 includes an IBM license its super easy to get data collection up and running. At the end of the day, its the same enterprise-class software that organizations used to buy traditional licenses for but without the big PO approval required. In fact, its as simple as logging in to the AWS Console, and then searching for AMIs with Tivoli in the name.

There’s a comprehensive FAQ about Tivoli on AWS on IBM developerWorks.

If you are interested in a close look at IBM Tivoli, there will be a webcast on Dec 15, 2009 at 8 AM PST, or 11 AM EST that features IBM product managers. You can register here.

— Mike

The New AWS Simple Monthly Calculator

by Jeff Barr | on | in Amazon CloudFront, Amazon CloudWatch, Amazon EC2, Amazon Elastic Load Balancer, Amazon Elastic MapReduce, Amazon RDS, Amazon S3, Amazon SDB, Amazon SQS, Announcements |

Our customers needed better ways to model their applications and estimate their costs. The flexible nature of on-demand scalable computing allows you pick and choose the services you like and only pay for those. Hence to give our customers an opportunity to estimate their costs, we have redesigned our current AWS Simple Monthly Calculator

The link to the new calculator is http://aws.amazon.com/calculator 

AWS Simple Monthly Calculator


This new calculator incorporates a wide array of pricing calculations across all our services in all our regions. It also shows breakdown of features for each service in each region. Most of our customers use multiple AWS services in multiple regions. Hence the calculator has a mechanism to select a service in a particular region and add it to the bill.

Users can select a region and simply input the usage values of each service in that region and then click on the “Add to Bill” button. This will add the service to the bill. Once the service is added to the bill, the user can modify/update the input fields, and the calculator will automatically recalculate. Users can remove the service from the bill by simply clicking on the red cross button Delete in the bill. Each input field represents to be a value per month. For example, Amazon S3 Data Transfer out is 40 GB in a Month and 5 Extra Large Linux Instances running for 10 hours in a month. For EC2/RDS/EMR, Users can click on green plus button Add and add similar type of instances (for eg., Web Servers, App Servers, MySQL DB Servers) in their topology.

The calculator also shows common customer samples and their usage. Users can click on “Disaster Recovery and Backup” sample or “Web Application” sample and see usages of each service. We will continue to add new real-world customer samples in future. If you would like us to add your sample usage, please let us know.

Last month, we announced new EC2 pricing, new Instance Types and a new AWS service (Amazon RDS). This calculator incorporates all of these new features and services and will continue to evolve as we add new features, new services and new regions .

The calculator is currently in beta and provides an estimate of the monthly bill. The goal is to help our customers and prospects estimate their monthly bill more efficiently. In the future, we would also like to give you recommendations on how you can save more by switching to Reserved Instances from on-demand instances or using AWS Import/Export service instead of standard S3 data transfer. Stay Tuned.

We would love to hear your feedback. Please let us know whether this new version of the calculator is helpful in estimating your AWS costs. You can send your feature requests, comments, suggestions, appreciations, confusions, frustrations to evangelists at amazon dot com. 

— Jinesh

New EC2 High-Memory Instances

by Jeff Barr | on | in Amazon EC2 |

In many cases, scaling out (by launching additional instances) is the best way to bring additional CPU processing power and memory to bear on a problem, while also distributing network traffic across multiple NICs (Network Interface Controllers). Certain workloads, however, are better supported by scaling up with a more capacious instance. Examples of these workloads include commercial and open source relational databases, mid-tier caches such as memcache, and media rendering.

To enable further scaling up for these workloads, we are introducing a new family of memory-heavy EC2 instances with the Double and Quadruple Extra Large High-Memory instance types. Here are the specs (note that an ECU is an EC2 compute unit, equivalent in CPU power to a 1.0-1.2 GHz 2007-era AMD Opteron or Intel Xeon processor):

  • Double Extra Large – 34.2 GB of RAM, and 13 ECU (4 virtual cores with 3.25 ECU each), 64-bit platform.
  • Quadruple Extra Large – 68.4 GB of RAM, and 26 ECU (8 virtual cores with 3.25 ECU each), 64-bit platform.

These new instance types are available now in multiple Availability Zones of both EC2 regions (US and Europe). Double Extra Large instances cost $1.20 per instance hour and the Quadruple Extra Large instances cost $2.40 per instance hour (these prices are for Linux instances in the US region).

These new instances use the most recent generation of processor and platform architectures. In order to get the best possible performance you should experiment with compiler settings and may also want to check out specialized compilers such as Intel’s Professional Edition and AMD’s Open64 Compiler Suite. As with all EC2 instances where the processor architecture underlying the virtualized compute resources may vary, you may want to think about ways to detect and adapt to the processor type at launch time if this will make a difference for your particular workload.

You can launch new Double Extra Large and Quadruple Extra Large instances today using the AWS Management Console or ElasticFox.

— Jeff;

Amazon EC2 – Now an Even Better Value

by Jeff Barr | on | in Amazon EC2, Price Reduction |

Effective November 1, 2009, the following per-hour prices will be in effect for Amazon EC2:

US EU
Linux Windows SQL Linux Windows SQL
m1.small $0.085 $0.12 $0.095 $0.13
m1.large $0.34 $0.48 $1.08 $0.38 $0.52 $1.12
m1.xlarge $0.68 $0.96 $1.56 $0.76 $1.04 $1.64
c1.medium $0.17 $0.29 $0.19 $0.31
c1.xlarge $0.68 $1.16 $2.36 $0.76 $1.24 $2.44

This represents a reduction of up to 15% from the current prices for Linux instances and is a direct result of our policy of working non-stop to drive our operating costs down for the benefit of our customers. This does not affect the price of our two new instance types.

This isn’t the first time we’ve lowered our prices in order to make AWS an even better value. In the past we’ve done this by adding tiered pricing to Amazon S3, reducing the storage and processing charges for SimpleDB, reducing the per-request pricing for SQS, and reducing bandwidth pricing for all services.

We’ve also reduced the prices for a number of IBM platform technologies. Take a look at Amazon Running IBM for a complete list of what we have to offer, along with the new prices.

Update: The first version of this post had the wrong prices for the SQL Server m1.large instances.

— Jeff;

New Public Data Set: YRI Trio

by Jeff Barr | on | in Amazon EC2 |

The YRI Trio Public Data Set provides complete genome sequence data for three Yoruba individuals from Ibadan, Nigeria, which represent the first human genomes sequenced using Illuminas next generation Sequence-by-Synthesis technology.

This data represents some of the first individual human genomes to be sequenced and peer-reviewed (the full story is here). This article contains full information about this remarkable and ground-breaking effort.

The data is described as “containing paired 35-based reads of over 30x average depth.” Basically this means that the data contains a large number of relatively short genome sequences, and that each base is present in at least 30 separate sequences. I asked my colleague Deepak Singh for a better explanation and this is what he told me:

In order to get better assembly and data accuracy you determine the order of bases n times. With older sequencing technologies you collected longer reads and coverage was typically in the n=4-6 range. The sequencing process also took a very long time (several months) to collect sufficient data. Modern, or next generation, sequencing technologies yield shorter reads but you get results much faster (days to weeks) and at much lower cost, so you can repeat the experiment many times to get better coverage. Higher coverage depth gives you the ability to detect low frequency common variations (which is how we are differentiated from one another, and can be characteristic of certain diseases) and improved genome assemblies.

Suggested uses for this data include:

  • The development of alignment algorithms.
  • The development of de novo assembly algorithms.
  • The development of algorithms that define genetic regions of interest, sequence motifs, structural variants, copy number variations, and site-specific polymorphisms.
  • To test the viability of annotation engines that start with raw sequence data.

By the way, this data set is big (700 GB), but you can create an EBS volume, attach it to an EC2 instance, and start processing it in minutes!

— Jeff;

AWS Workshops in Beijing, Bangalore and Chennai

by Jeff Barr | on | in Amazon EC2, Amazon Elastic MapReduce, Amazon S3, Amazon SQS, Announcements |

I will be in China and India starting next week. Apart from other meetings and presentations to user group, this time, I will be taking up 3-hour workshops. These workshops are targeted at architects and technical decision makers and attendees will get a chance to play with core AWS infrastructure services.

If you are a System Integrator or Independent Software Vendor, Enterprise Architect or a entrepreneur, this will be a great opportunity to meet and learn more about AWS.

Seats are limited and prior registration is required:

AWS Workshop in Beijing
Oct 24th
: http://sd2china.csdn.net
(in conjunction with CSDN conference)

AWS Workshop in Bangalore
Nov 4th : http://www.btmarch.com/btsummit/edm/Amazon.html
(in conjunction with Business Technology Summit)

AWS Workshop in Chennai
Nov 10th : http://www.awschennai.in/
(in conjunction with AWS Chennai Enthusiasts group)

Details of the Workshop

AMAZON WEB SERVICES DEEP DIVE – CLOUD WORKSHOP

INTRO TO AWS INFRASTRUCTURE SERVICES 30 MIN
Learn how to create an AWS account, understand SOAP, REST and Query APIs and learn how to use the tools like AWS management Console

DEEP DIVE INTO AWS INFRASTRUCTURE SERVICES – 60 MIN
Amazon EC2 – Learn how to create, bundle and launch and AMI, Setting up Amazon EBS volumes, Elastic IP
Amazon S3 buckets objects and ACLs and Amazon CloudFront distributions
Amazon SQS queues
Amazon SimpleDB Domains, Items, Attributes and Querying
Amazon Elastic MapReduce JobFlows and Map Reduce

Exercise/Assignments:
Architect a Web application in the Cloud to be discussed in class

ARCHITECTING FOR THE CLOUD: BEST PRACTICES 30 MIN
Learn how to build highly scalable applications in the cloud. In this session, you will learn about best practices, tip, tricks and techniques of leveraging the highly scalable infrastructure platform: AWS cloud.
 
MIGRATING APPLICATIONS TO THE CLOUD – 30 MIN
Learn a step by step approach to migrate your existing applications to the Cloud environment. This blueprint will help enterprise architects in performing a cloud assessment, selecting the right candidate for the Cloud for a proof of concept project and leveraging the actual benefits of the Cloud like auto-scaling and low-cost business continuity. Jinesh will discuss migration plans and reference architectures of various examples, scenarios and use cases.

Laptop required
Knowledge of Java language preferred

See you at the workshop!

– Jinesh

SecondTeacher – Scalable Math Homework Help in the Cloud

by Jeff Barr | on | in Amazon EC2, Amazon SDB, Cool Sites |

Despite the fact that I have written over 800 posts for this blog, I never know what to expect in terms of traffic or attention from any given post. Last Friday afternoon I spent an hour or two putting together a relatively quick post to remind everyone that it is possible to use Amazon SimpleDB for free.

Driven by my initial tweet and a discussion on Hacker News , word spread quickly and traffic to the post was quite robust, especially for a Friday evening!

I’m a big believer in the ClueTrain and its perspective-changing observation that “Markets are conversations.” Blog posts are a great way to get conversations started. Several people took the time to leave comments on the post. A comment from Robert Doyle of SecondTeacher caught my eye, and I asked him if he could tell me a bit more about his use of SimpleDB. He sent me a wonderful summary of his experience to date with the site and with AWS. I’ll do my best to summarize what he told me.

 

Robert is a one-time OS/2 developer who now writes Windows code. When he was designing Second Teacher he decided to offload the heavy lifting to someone else so that he and his brother could focus on the site instead of on the infrastructure.

Robert and his brother started SecondTeacher a year or so ago, with the idea that they could make math easier to understand for kids in school. They wanted to make the service inexpensive (currently $25 per year) and accessible. They realized this goal by creating a 10-minute instructional video for each chapter of a number of popular textbooks. A certified teacher explains the concepts and works the problems using a whiteboard.

Traffic to the site is variable yet very predictable. Peak traffic occurs between 3 PM and 9 PM on weekdays, with very low load at other times. Robert told me that he keeps one EC2 instance running in off hours and scales to twenty during the daily peak. Over the course of a week he has an average of three instances running.

The site uses a number of AWS services! The video content is stored in Amazon S3 and distributed world-wide using Amazon CloudFront. Elastic Load Balancing is used to distribute incoming traffic across a variable number of EC2 instances.

User data, content management information, and session state are all stored in Amazon SimpleDB. The session state (a string) is retrieved on each page request and averages 200 bytes per user. Robert said that all of his usage to date has remained within the free level! He also told me:

We really have used SimpleDB as a traditional database and we have found it to be easily as reliable as roll your own databases and with little or no maintenance required. Initially we found the variable number of attributes in a domain a bit off putting but we have grown to love it and use it extensively.

You can read Robert’s recent post, Building a Scalable Website, to learn even more about the site’s architecture.

As a parent who has spent way too much time trying to understand enough “new math” in order to help my children with their homework over the years, this sounds like a really valuable tool.

— Jeff;

Lower Prices for EC2 Windows Instances using Authentication Services

by Jeff Barr | on | in Amazon EC2, Price Reduction |

We’ve removed the distinction between Amazon EC2 running Windows and Amazon EC2 running Windows with Authentication Services, allowing all of our Windows instances to make use of Authentication Services such as LDAP, RADIUS, and Kerberos. With this change, any Windows instance can host a Domain Controller or join an existing domain. File sharing services such as SMB between instances will now automatically default to SMB-over-TCP in all cases, and will also be able to negotiate more secure authentication.

Existing Windows with Authentication Services instances will now be charged the same price as Windows instances, a savings of 50% on the hourly rate. All newly launched instances will be charged the new, lower price (starting at 12.5 cents per hour for a 32-bit instance in the US). Applications requiring logins can now be run on the Amazon EC2 running Windows AMIs.

As a result of these changes, our Windows AMI lineup now looks like this:

  • US:
    • Amazon EC2 running Windows (32 bit) – English.
    • Amazon EC2 running Windows (64 bit) – English.
    • Amazon EC2 running Windows With SQL Server (64 bit) – English.
  • Europe:
    • Amazon EC2 running Windows (32 bit) – English, German, French, Spanish, Italian.
    • Amazon EC2 running Windows (64 bit) – English, German, French, Spanish, Italian.
    • Amazon EC2 running Windows With SQL Server (64 bit) – English, German, French, Spanish, Italian.

If you are using Amazon DevPay in conjunction with Amazon EC2 running Windows with Authentication Services you will need to create new AMIs and adjust your pricing plan before November 1, 2009.

We continue to strive for simplicity and cost effectiveness; this is a good example of both!

— Jeff;

PS – I know that a lot of you have been asking us to support Windows Server 2008.  I don’t have a release date for you yet, but I can assure you that we’ve prioritized the work needed to properly support it.

Bioinformatics, Genomes, EC2, and Hadoop

by Jeff Barr | on | in Amazon EC2, Cool Sites |

I think it is really interesting to see how breakthroughs and process improvements in one scientific or technical discipline can drive that discipline forward while also enabling progress in other seemingly unrelated disciplines.

The Bioinformatics field is rife with examples of this pattern. Declining hardware costs, cloud computing, the ability to do parallel processing, and algorithmic advances have driven down the cost and time of gene sequencing by multiple orders of magnitude in the space of a decade or two. Processing that was once measured by years and megabucks is now denominated by hours and dollars.

My colleague Deepak Singh pointed out a number of recent AWS-related developments in this space:

JCVI Cloud Bio-Linux

Built on top of a 64-bit Ubuntu distribution, the JCVI Cloud Bio-Linux gives scientists the ability to launch EC2 instances chock-full of the latest bioinformatics packages including BLAST (Basic Local Alignment Search Tool), glimmer (Microbial Gene-Finding System), hmmer (Biosequence Analysis Using Profile Hidden Markov Models), phylip (Phylogeny Inference Package), rasmol (Molecular Visualization) genespring (statistical analysis, data mining, and visualization tools), clustalw (general purpose multiple sequence alignment), the Celera Assembler (de novo whole-genome shotgun DNA sequence assembler), and the NIH EMBOSS utilities. The Celera Assembler can be used to assemble entire bacterial genome sequences on Amazon EC2 today!

There’s a getting-started guide for the JCVI AMI. Graphical and command- line bioinformatics tools can be launched from a shell window connected to a running instance of the AMI.

CloudBurst

CloudBurst is described as a “new parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes, for use in a variety of biological analyses including SNP discovery, genotyping, and personal genomics.”

In laymen’s terms, CloudBurst uses Hadoop to implement a linearly scalable search tool. Once loaded with a reference genome, it maps the “short reads” (snippets of sequenced DNA approximately 30 base pairs long) to a location (or locations) on the reference genome. Think of it as a very advanced form of string matching, with support for partial matches, insertions, deletions, and subtle differences. This is a highly parallelizable operation; CloudBurst reduces operations involving millions of short reads from hours to minutes when run on a large-scale cluster of EC2 instances.

You can read more about CloudBurst in the research paper. This paper includes benchmarks of CloudBurst on EC2 along with performance and scaling information.

 

Crossbow

Crossbow was built to do “Whole Genome Resequencing in the Clouds.” It combines Bowtie for ultra-fast short read alignment and SOAPsnp for sequence assembly and high quality SNP calling. The Crossbow home page claims that it can sequence an entire genome in an afternoon on EC2, for less than $250. Crossbow is so new that the papers and the code distribution are still a little ways off. There’s a lot of good information in this poster:

Michael Schatz (the principal author of CloudBurst and Bowtie) wrote a really interesting note on Hadoop for Computational Biology. He states that “CloudBurst is just the beginning of the story, not the end.” and endorses the Map/Reduce model for processing 100+GB datasets. I will echo Mike’s conclusion to wrap up this somewhat long post:

In short, there is no shortage of opportunities for utilizing MapReduce/Hadoop for computational biology, so if your users are skeptical now, I just ask that they are patient for a little bit longer and reserve judgment on MapReduce/Hadoop until we can publish a few more results.

I really learned a lot while putting this post together and I hope that you will learn something by reading it. If you are using EC2 in a bioinformatics context, I’d love to hear from you. Leave a comment or send me some mail.

— Jeff;