Category: Amazon EC2*


What Do You Run?

As more and more businesses run applications in the cloud, we’re starting to hear about mainstream software from the likes of IBM and Oracle running on Amazon EC2.

There are strong privacy and security controls in place around each AWS customer account, and accordingly there’s no way for us to gain a sense of who and how many organizations are doing this. If your organization fits this profile–especially if you run either IBM or Oracle–we’d love to hear from you. Please drop us a note at awseditor at amazon dot com, or simply leave a private comment here.

Mike

Announcing Amazon Elastic MapReduce

Today we are introducing Amazon Elastic MapReduce , our new Hadoop-based processing service. I’ll spend a few minutes talking about the generic MapReduce concept and then I’ll dive in to the details of this exciting new service.

Over the past 3 or 4 years, scientists, researchers, and commercial developers have recognized and embraced the MapReduce programming model. Originally described in a landmark paper, the MapReduce model is ideal for processing large data sets on a cluster of processors. It is easy to scale up a MapReduce application to jobs of arbitrary size by simply adding more compute power. Here’s a very simple overview of the data flow in a typical MapReduce job:

Given that you have enough computing hardware, MapReduce takes care of splitting up the input data into chunks of more or less equal size, spinning up a number of processing instances for the map phase (which must, by definition, be something that can be broken down into independent, parallelizable work units) apportioning the data to each of the mappers, tracking the status of each mapper, routing the map results to the reduce phase, and finally shutting down the mappers and the reducers when the work has been done. It is easy to scale up MapReduce to handle bigger jobs or to produce results in a shorter time by simply running the job on a larger cluster.

Hadoop is an open source implementation of the MapReduce programming model. If you’ve got the hardware, you can follow the directions in the Hadoop Cluster Setup documentation and, with some luck, be up and running before too long.

Developers the world over seem to think that the MapReduce model is easy to understand and easy to work in to their thought process. After a while they tend to report that they begin to think in terms of the new style, and then see more and more applications for it. Once they start to show that the model has a genuine business model (e.g. better results, faster) demand for hardware resources increases rapidly. Like any true viral success, one team shows great results and before too long everyone in the organization wants to do something similar. For example, Yahoo! uses Hadoop on a very large scale. A little over a year ago they reported that they were able to use the power of over 10,000 processor cores to generate a web map to power Yahoo! Search.

This is Rufus, the “first dog” of our AWS Developer Relations team. As you can see, he’s scaled up quite well since his debut on this very blog three years ago. Your problems may start out like the puppy-sized version of Rufus but will quickly grow into the full-scale 95 pound version.

Over the past year or two a number of our customers have told us that they are running large Hadoop jobs on Amazon EC2. There’s some good info on how to do this here and also here. AWS Evangelist Jinesh Varia covered the concept in a blog post last year, and also went into considerable detail in his Cloud Architectures white paper.

Given our belief in the power of the MapReduce programming style and the knowledge that many developers are already running Hadoop jobs of impressive size in our cloud, we wanted to find a way to make this important technology accessible to even more people.

Today we are rolling out Amazon Elastic MapReduce. Using Elastic MapReduce, you can create, run, monitor, and control Hadoop jobs with point-and-click ease. You don’t have to go out and buys scads of hardware. You don’t have to rack it, network it, or administer it. You don’t have to worry about running out of resources or sharing them with other members of your organization. You don’t have to monitor it, tune it, or spend time upgrading the system or application software on it. You can run world-scale jobs anytime you would like, while remaining focused on your results. Note that I said jobs (plural), not job. Subject to the number of EC2 instances you are allowed to run, you can start up any number of MapReduce jobs in parallel. You can always request an additional allocation of EC2 instances here.

Processing in Elastic MapReduce is centered around the concept of a Job Flow. Each Job Flow can contain one or more Steps. Each step inhales a bunch of data from Amazon S3, distributes it to a specified number of EC2 instances running Hadoop (spinning up the instances if necessary), does all of the work, and then writes the results back to S3. Each step must reference application- specific “mapper” and/or “reducer” code (Java JARs or scripting code for use via the Streaming model). We’ve also included the Aggregate Package with built-in support for a number of common operations such as Sum, Min, Max, Histogram, and Count. You can get a lot done before you even start to write code!

We’re providing three distinct access routes to Elastic MapReduce. You have complete control via the Elastic MapReduce API, you can use the Elastic MapReduce command-line tools, or you can go all point-and-click with the Elastic MapReduce tab within the AWS Management Console! Let’s take a look at each one.

The Elastic MapReduce API represents the fundamental, low-level entry point into the system. Action begins with the RunJobFlow function. This call is used to create a Job Flow with one or more steps inside. It accepts an EC2 instance type, an EC2 instance count, a description of each step (input bucket, output bucket, mapper, reducer, and so forth) and returns a Job Flow Id. This one call is equivalent to buying, configuring, and booting up a whole rack of hardware. The call itself returns in a second or two and the job is up and running in a matter of minutes. Once you have a Job Flow Id, you can add additional processing steps (while the job is running!) using AddJobFlowSteps. You can see what’s running with DescribeJobFlows, and you can shut down one or more jobs using TerminateJobFlows.

The Elastic MapReduce client is a command-line tool written in Ruby. The client can invoke each of the functions I’ve already described. You can create, augment, describe, and terminate Job Flows from the command line.

Finally, you can use the new Elastic MapReduce tab of the AWS Management Console to create, augment, describe, and terminate job flows from the comfort of your web browser! Here are a few screen shots to whet your appetite:

 

 

I’m pretty psyched about the fact that we are giving our users access to such a powerful programming model in a form that’s really easy to use. Whether you use the console, the API, or the command-line tools, you’ll be able to focus on the job at hand instead of spending your time wandering through dark alleys in the middle of the night searching for more hardware.

What do you think? Is this cool, or what?

— Jeff;

Up, Up, and Away – Cloud Computing Reaches for the Sky

Early this morning we launched a brand new cloud computing service. This revolutionary new technology will change the way you think about the cloud.

For a while the cloud was simply a metaphor meaning “a bunch of computers somewhere else.” Until now, somewhere else meant good old terra firma, the Earth itself. After extensive customer research we found that this rigid, antiquated way of thinking just won’t cut it in today’s post-capitalist world. They need locational flexibility, the ability to literally instantiate a cloud where they need it, when they need it.

To solve this problem, we have designed and are now introducing the Floating Amazon Cloud Environment, or FACE for short. Using the latest in airship technology, we’ve created a cloud that can come to you.

The FACE uses durable, unmanned helium-filled blimps with a capacity of 65,536 small EC2 instances, or a proportionate number of larger instances. The top of each blimp is coated in polycrystalline solar cells which supply approximately 40% of the power needed by the servers and the on-board navigation, communication, and defense systems.  The remainder of the power is produced by clean, efficient solid oxide fuel cells. There’s enough fuel onboard to last about a month under normal operating conditions. Waste heat from the fuel cells and from the servers is used to generate additional lift.

There are two options for ground communication, WiMAX and laser. The WiMAX option provides low latency and respectable bandwidth. If you have the ground facility and the line of sight access needed to support it, lasers are the way to go. The on-board laser doubles as a defense facility, keeping each FACE safe from harm. Using automated target detectors with human confirmation via the Mechanical Turk, competitors won’t have a chance.

Update: Based on popular demand, we will also implement RFC 1149.

FACE can operated in shared or dedicated mode. In dedicated mode, the FACE does its best to remain at a fixed position. In shared mode, each FACE constantly optimizes its position to provide the best possible service to everyone. As always, this amazing functionality is available via the EC2 API (You’ll need the new 2009-04-01 WSDL), the command line tools, and the AWS Console.

Derivative funds and large government-subsidized entities will be especially interested in FACEs transmodal operation. They can allocate a dedicated FACE, load it up with data, and then send it out to sea to perform advanced processing in safety. The government will have absolutely no chance of acting against them, because they will be too busy trying to decide which Federal Air Regulation (FAR) was violated, not to mention scheduling news conferences.

We believe that the FACE will be the perfect solution for LAN parties, tech conferences, and large-scale sporting events.

Availability is limited and this may be a one-time, perhaps even a one-day offer. Get your FACE now.

— Jeff;

New AWS Toolkit for Eclipse

We want to make the process of building, testing, and deploying applications on Amazon EC2 as simple and efficient as possible. Modern web applications typically run in clustered environments comprised of one or more servers. Unfortunately, setting up a cluster can involve locating, connecting, configuring and maintaining a significant amount of hardware. Once this has been done, keeping the operating system, middleware, and application code current and consistent across each server can add inefficiency and tedium to the development process. In recent years, Amazon Web Services has helped to ease much of this burden, trivializing the process of acquiring, customizing, and running server instances on demand.

Also, in the last couple of years, the Eclipse IDE ( Integrated Development Environment) has become very popular among developers. The modular nature of the Eclipse architecture opens the door to customization, extension, and continuous refinement via plug-ins (full directory here).

Today, we are introducing the AWS Toolkit for Eclipse. This free, open source plugin for the Eclipse IDE makes it easier and more efficient for you to develop, deploy, and debug Java applications on top of AWS. In fact, you can design an entire AWS-hosted Tomcat-based cluster from within Eclipse. You can design your cluster, specifying the number of EC2 instances and the instance type to run. You can select and even create security groups and keypairs and can associate an Elastic IP address with each instance.

The plugin will manage your cluster, starting up instances as needed and then keeping them alive as you develop, deploy, and debug. If you start your application in Debug mode, you can set remote breakpoints, inspect variables or stack frames, and even single-step through the remote code. You can see all of this great functionality in action here.

This is a first step for us, and we anticipate supporting additional languages and application servers (e.g. Glassfish, JBoss, WebSphere, and WebLogic) over time. As is the case with all of our services, customer input and feedback will help to shape the direction of the plugin.

As I noted before, the new AWS Toolkit for Eclipse is free and you can download it now. You can contribute your own enhancements to the toolkit by joining the SourceForge project .

— Jeff;

Announcing Amazon EC2 Reserved Instances

.tab20090305 { border: 2px solid black; border-spacing: 4px; margin-left: auto; margin-right: auto; border-collapse: collapse; } .tab20090305 td { border-bottom: 1px solid black; border-right: 1px solid black; padding: 4px; text-align: center; } .tdtitle20090305 { background-color: #e0e0e0; font-weight: bold; } .li20090305 { padding-bottom: 6px; }

Earlier in my career, I thought that innovation was solely about technology. If you wanted to address a new market or to increase sales, writing more code was always a good option. Having gained some wisdom and experience over the years, I’ve finally figured out the obvious — that innovation can also take the form of a business model!

Since I first blogged about Amazon EC2 in the summer of 2006, developers and IT professionals have found all sorts of ways to put it to use. Many of those have been covered in this blog ; we’ve written a bunch of case studies about quite a few, and I’ve bookmarked many more on the AWS Buzz feed. As our customer’s use cases have grown, we’ve done our best to listen to their feedback, adding such features as additional instances types, multiple availability zones, multiple geographic regions, persistent disk storage, support for Microsoft Windows, and control over IP addresses.

The well-known pay-as-you-go EC2 pricing model is very similar to what an economist would call an on-demand or spot market. There’s no need make any up-front commitment; you simply pay for your processing an hour at a time. This model has served us well so far and it will continue to be a fundamental aspect of our strategy.

We’ve learned that some of our customers have needs which aren’t addressed by the spot pricing model. For example, some of them were looking for even lower prices, and were willing to make a commitment ahead of time in order to achieve this. Also, quite a few customers actually told us something even more interesting: they were interested in using EC2 but needed to make sure that we would have a substantial number of instances available to them at any time in order for them to use EC2 in a DR (Disaster Recovery) scenario. In a scenario like this, you can’t simply hope that your facility has sufficient capacity to accommodate your spot needs; you need to secure a firm resource commitment ahead of time.

Taking these requirements into account, we’ve created a new EC2 pricing model, which we call Reserved Instances. After you purchase such an instance for a one-time fee, you have the option to launch an EC2 instance of a certain instance type, in a particular availability zone, for a period of either 1 of 3 years. Your launch is guaranteed to succeed; there’s no chance of encountering any transient limitations in EC2 capacity. You have no obligation to run the instances full time, so you’ll pay even less if you choose to turn them off when you are not using them.

Steady-state usage costs, when computed on an hourly basis over the term of the reservation, are significantly lower than those for the on-demand model. For example, an on-demand EC2 Small instance costs 10 cents per hour. Here’s the cost breakdown for a reserved instance (also check out the complete EC2 pricing info):

Term One-time Fee Hourly Usage Effective 24/7 Cost
1 Year $325 $0.030 $0.067
3 Year $500 $0.030 $0.049

Every one of the EC2 instance types is available at a similar savings. We’ve preserved the flexibility of the on-demand model and have given you a new and more cost-effective way to use EC2. Think of the one-time fee as somewhat akin to acquiring hardware, and the hourly usage as similar to operating costs.

All of the launching, metering, and billing is fully integrated. Once you’ve purchased one or more reserved instances, the EC2 RunInstances call will draw upon your reserve before allocating on-demand capacity. This new feature is available for Linux and OpenSolaris instances in the US now, with the same support to follow in Europe in the near future.

We’ve added a number of new command-line (API) tools to support the Reserved Instances. Here’s what they do:

  • The ec2-describe-reserved-instance-offerings command lists the set of instance offerings that are available for purchase.
  • The ec2-purchase-reserved-instances-offering command makes the actual purchase of one or more reserved instances.
  • The ec2-describe-reserved-instances command displays a list of the instances that have been purchased.

Of course, all of this new functionality is fully programmable. We’ve added a number of new EC2 APIs:

  • DescribeReservedInstancesOfferings returns a list of Reserved Instance offerings that are available for purchase. This call enumerates the inventory within a particular availability zone.
  • PurchaseReservedInstancesOffering makes the actual purchase of a Reserved Instance within an availability zone. Up to 20 instances can be purchased with a single call, subject to availability and account limitations. This is like “buy a vowel” from Wheel of Fortune, but you get a server (much more useful) instead.
  • DescribeReservedInstances – returns a list of the instances that have been purchased for the account.

We’re planning to give the AWS Console full control over the Reserved Instances. I expect to see other tool vendors add support as well.

If you have any questions about the new Reserved Instances, check out the entries in in the newly revised EC2 FAQ.

I’m looking forward to receiving your feedback on this new and innovative business model for EC2. Please feel free to leave me a comment.

— Jeff;

JumpBox Rapid Trials on EC2

Late last year I blogged about JumpBox . I talked about how their lineup of public EC2 AMIs really streamlined the process of getting started with a number of powerful open source applications.

Earlier this week, Sean and Kimbro of JumpBox told me about their newest development, the JumpBox Rapid Trial. Powered by EC2, the Rapid Trial lets you conduct free, hour-long trials of most of the applications in their Open Library with a single click.

For example, you can launch a trial version of the MediaWiki Wiki system by going here and clicking on the “Trial This JumpBox” button. You’ll be prompted for your name and email address, and then you’ll wait a minute or so for the EC2 instance to launch. When it is ready the box will give you a link to an administrative console. You’ll set the computer name, enter your email address, set your time zone, set the administrator password, and agree to the license agreement. One more click and the wiki is up and running and ready for evaluation.

 

You can start to configure and use the wiki during the one hour evaluation period. If it does what you want and you start to enter some real data, you can use the web-based JumpBox administrative tools to back up the configuration and the user data to Amazon S3. On production instances, you can even set it up to do automatic backups daily, weekly, or hourly with full control of how many old backups you’d like to keep around.

All in all this is very slick and is a great way to illustrate the reduction in friction that is possible with cloud computing. Commercial software vendors need to take a look at this innovative Rapid Trial model and figure out how to do something similar for their own products.

— Jeff;

Additional EC2 Support for Windows – Second Zone in the US and Two Zones in Europe

We’ve been working to make it possible for you to run Windows or SQL Server in additional locations and to build highly available applications.

You now have the ability to launch EC2 running Windows or SQL Server in the EU-West region, in two separate Availability Zones. You can also launch EC2 running Windows or SQL Server in a second Availability Zone in the US-East region. With the additional of the new European region and the additional US zone you now have the tools needed to build Windows-based applications that are resilient against failure of an availability zone.

 

The AWS Management Console has been updated with full support for the EU-West region. After selecting the new region from the handy dropdown (shown at right), you can launch EC2 instances, create, attach and destroy EBS volumes, manage Elastic IP addresses, and more.

 

We’ve created new Windows AMIs with the French, German, Italian, and Spanish language packages installed. The Console even provides a new Language menu in the quick start list. Once launched, you simply set the locale in the Windows Control Panel. You can find step by step directions for launching AMIs in various languages here.

The popular ElasticFox tool now lets you tag running instances, EBS volumes, and EBS snapshots. The Image and Instance views have been assigned to distinct tabs and you can now specify a binary (non-text) file as instance data at launch time.

While I’m talking about all things European, I should mention two other items that may be of interest to you. First, Amazon CTO Werner Vogels will deliver a keynote at the Cebit conference in Germany later this week. Second, we have an opening in Luxembourg for an AWS Sales Representative.

— Jeff;

New AWS Public Data Sets – Economics, DBpedia, Freebase, and Wikipedia

We have just released four additional AWS public data sets, and have updated another one.

In the Economics category, we have added a set of transportation databases from the US Bureau of Transportation Statistics. Data and statistics are provided for aviation, maritime, highway, transit, rail, pipeline, bike & pedestrian, and other modes of transportation, all in CSV format. I was able to locate employment data for our hometown airline and found out that they employed 9,322 full-time and 1,122 part-time employees as of the end of 2007.

In the Encyclopedic category, we have added access to the DBpedia Knowledge Base, the Freebase Data Dump, and the Wikipedia Extraction, or WEX.

The DBpedia Knowledge Base currently describes more than 2.6 million things including 213,000 people, 328,000 places, 57,000 music albums, 36,000 films, and 20,000 companies. There are 274 million RDF triples in the 67 GB data set.

The 66 GB Freebase Data Dump is an open database of the world’s information, covering millions of topics in hundreds of categories.

The Wikipedia Extraction (WEX) is a processed, machine-readable dump of the English-language section of the Wikipedia. At nearly 67 GB, this is a handly and formidable data set. The data is provided is the TSV format as exported by PostgreSQL.

Finally, we have updated the NCBI’s Genbank data. Weighing in at a hefty quarter of a petabyte terabyte, this public data set contains information on over 85 billion bases and 82 million sequence records.

Instantiating these data sets is basically trivial. You create a new EBS volume of the appropriate size, basing it on the snapshot id of the data. Next, you attach the volume to a running EC2 instance in the same availability zone. Finally, you create a mount point and mount the EBS volume on the instance. The last step can take a minute or two for a large volume; the other steps are essentially instantaneous. Instead of spending days or weeks downloading these data sets you can be up and running from a standing start in minutes. Once again, cloud computing reduces the friction between “I have a good idea” and “here’s the realization of my idea.” You don’t need loads of bandwidth, processing power, or local disk space in order to do interesting and significant work with these world-scale data sets.

— Jeff;

IBM Software Available on EC2 With Pay-As-You-Go Licensing Model

We’ve teamed up with IBM to provide software developers with pay-as-you-go access to development and production versions of IBM Information Management database servers, IBM Lotus content management, and IBM WebSphere portal and middleware products, all running on Novell’s SUSE Linux on Amazon EC2.

There’s a lot to say, so I’ll summarize the key points up front before diving in. First, development AMIs are now available at the new IBM Cloud Space on developerWorks. Second, you can bring your existing licenses into the cloud. Third, hourly pricing for the production versions of each product will be published sometime soon.

 

Existing IBM customers can use the licenses they’ve already bought while still taking advantage of the elastic nature of AWS to handle spikes and peaks. These licenses retain their value and can be used to handle steady state processing needs, with more licenses available (on an hourly basis) in the cloud for peak times. This clean and innovative new model should clear up some of the uncertainty which can cause potential users to think twice before jumping in to cloud computing. A new IBM PVU (Processor Value Unit) table will map between PVUs and the full set of available EC2 instance types. See our new IBM partner page for details.

 

The following products will be available in AMI (Amazon Machine Image) form:

  • IBM DB2 – A database server designed to handle demanding workloads, featuring scaling to handle high volume transaction processing, automatic compression, optimized XML storage, and lots more. Get started here.
  • IBM Informix Dynamic Server – IBM’s flagship database for industrial-strength, embedded computing. Featuring blazing online transaction processing (OLTP) performance, legendary reliability, and nearly hands-free administration. Get started here.
  • IBM Lotus Web Content Management Standard Edition – End-to-end web content management for internet, intranet, extranet, and portal sites. Get started here.
  • IBM WebSphere sMash – A development and runtime environment for agile development of Web 2.0-style applications using SOA principles. Get started here.
  • IBM WebSphere Portal Server – A runtime server and tools (among other features) that can be used to create a single customized interface for a collection of enterprise applications, combining components, applications, processes, and content from a variety of sources. Get started here.

If you are an ISV (Independent Software Vendor) developing a service that will be commercially available, you are eligible to access these AMIs at no charge (other than the usual EC2 charges, plus nominal setup and monthly fees) for development purposes, via IBM developerWorks. Everything that you’ll need to get started can be found in the new Cloud Computing Resource Center. You may also want to read the white paper, IBM’s Perspective on Cloud Computing.

As someone who once programmed IBM mainframes using 80 column punched cards, this is a pretty exciting announcement. Developers now have easy access to IBM’s line of robust, industrial strength software products and can build highly scalable applications which take full advantage of the new and flexible licensing model. Questions about commercial software licenses (and their applicability to the cloud) come up at almost every one of my speaking engagements! I’m happy to be able to point to IBM as an example of a software vendor with a licensing model which is cloud-aware and cloud-friendly.

I also think that this announcement really highlights EC2’s inherent flexibility. Customers can bring their existing code and software licenses into the cloud and can deploy it without having to pay any up-front licensing costs.

— Jeff;

iPhone Console for EC2

This is a very brief post to call your attention to yet more innovation in the Amazon Web Services ecosystem: in this case an iPhone console application that monitors and controls your Amazon EC2 environment. David Kavanagh and company cooked this up over at directThought.

My mind immediately went to “Sitting in Maui, umbrella drink by the pool, time to add a few more instances to my Amazon EC2 server fleet by tapping on the iPhone. Ahhh…” Then reality struck — it’s snowing outside the hotel I’m in.

The underlying client toolkit (cTypica) is licensed under the Apache 2.0 License.

You can preview the application here. Now is your chance to provide input on what will be a very useful tool for AWS users who have an iPhone.

Mike