Category: Amazon EC2


More on ADFS with Amazon EC2

by Jeff Barr | on | in Amazon EC2, Windows |

Thanks to those who wrote to me with ideas about using ADFS to federate with Windows instances running on Amazon EC2. My original post was picked up by a couple other blogs, which Id like to acknowledge here:

ADFS-EC2-SSO As part of a joint project between Amazon Web Services and Microsoft, Im proud to announce the release of a whitepaper written by David Chappell that explores these federation scenarios in more detail. David begins his paper with an additional scenario your Amazon EC2 resources are placed in an Amazon Virtual Private Cloud (Amazon VPC) and joined to your own corporate domain; here, theres no use of ADFS. Then he illustrates the two scenarios I mentioned before, and shows how it would work with both ADFS 1.1 and ADFS 2.0.

Soon well release a companion step-by-step guide that walks you through the steps required to build these federation scenarios in a lab. From this youll gain the skills and experience necessary to implement them in your production environment. Ill announce here when the guide is available for download.

> Steve <

Federation with ADFS in Windows Server 2008

by Jeff Barr | on | in Amazon EC2, Windows |

As I’ve talked with customers who have deployed or plan to deploy Windows Server 2008 instances on Amazon EC2, one feature they commonly inquire about is Active Directory Federation Services (ADFS). There seems to be a lot of interest in ADFS v2 with its support for WS-Federation and Windows Identity Foundation. These capabilities are fully supported in our Windows Server 2008 AMIs and will work with applications developed for both the “public” side of AWS and those you might run on instances inside Amazon VPC.

I’d like to get a better sense of how you might use ADFS. When you state that you need “federation,” what are you wanting to do? I imagine most scenarios involve applications on Amazon EC2 instances obtaining tokens from an ADFS server located inside your corporate network. This makes sense when your users are in your own domains and the applications running on Amazon EC2 are yours.

Another scenario involves a forest living entirely inside Amazon EC2. Imagine you’ve created the next killer SaaS app. As customers sign up, you’d like to let them use their own corpnet credentials rather than bother with creating dedicated logons (your customers will love you for this). You’d create an application domain in which you’d deploy your application, configured to trust tokens only from the application’s ADFS. Your customers would configure their ADFS servers to issue tokens not for your application but for your application domain ADFS, which in turn issues tokens to your application. Signing up new customers is now much easier.

What else do you have in mind for federation? How will you use it? Feel free to join the discussion. I’ve started a thread on the forums, please add your thoughts there. I’m looking forward to some great ideas.

> Steve <

Third-Party AWS Tracking Sites

by Jeff Barr | on | in Amazon CloudWatch, Amazon EC2, Amazon S3 |

A couple of really cool third-party AWS tracking sites have sprung up lately. Some of these sites make use of AWS data directly and others measure it using their own proprietary methodologies. I don’t have any special insight in to the design or operation of these sites, but at first glance they appear to be reasonably accurate.

Cloud Exchange

Tim Lossen‘s Cloud Exchange site tracks the price of EC2 Spot Instances over time and displays the accumulated data in graphical form, broken down by EC2 Region, Instance Type, and Operating System. Here’s what it looks like:

Spot History

The Spot History site also tracks the price of EC2 Spot Instances over time. This one doesn’t break the prices down by Region. Here’s what it looks like:

Cloudelay

Marco Slot‘s Cloudelay site measures latency from your current location (e.g. your browser) to Amazon S3 and Amazon CloudFront using some clever scripting techniques.

Timetric

Timetric tracks the price of the EC2 Spot Instances and displays them in a number of ways including spot price as a percentage of the on-demand price and a bar chart. They also provide access to the for DIY charting.

— Jeff;

Fotopedia and AWS

by Jeff Barr | on | in Amazon CloudFront, Amazon EC2, Amazon Elastic Load Balancer, Amazon Elastic MapReduce, Europe |

Hi there, this is Simone Brunozzi, Technology Evangelist for AWS in Europe. I’ll steal the keyboard from Jeff Barr for a few minutes to share something really interesting with you: in fact, It is always fascinating to see how our customers are using the Amazon Web Services to power their businesses.

Olivier Gutknecht, Director of Server Software at French-based Fotonauts Inc., spent some time with me to describe how they use AWS to power Fotopedia, a collaborative photo encyclopedia.

We have been very lucky with our development timeframe: we developed this project while Amazon was building its rich set of services. Early in the development we tested Amazon S3 as the main data store for our images and thumbnails. Switching our first implementation to S3 was a matter of days. Last year, when our widgets were featured on LeWeb 08 site, we enabled Amazon CloudFront for distribution of our images – literally days after the official CloudFront introduction. Before this, we moved our processing to EC2 instances and persistent EBS volumes. And in the previous months, we integrated the Elastic Load Balancing and Elastic Map Reduce into our stacks.

It is interesting to see how the AWS services replaced our initial implementation. We’re not in the business of configuring Hadoop for the cloud, for example, so we’re quite happy to use such a service if it fits our needs. The same happened to our HTTP fault tolerance layer, quickly replaced with AWS ELB.

So Amazon S3, CloudFront and EC2 (with Elastic Block Storage (EBS) volumes for the data stores) are the three key services that they are using to power Fotopedia, but they also take advantage of other AWS services.

We regularly analyze a full Wikipedia dump to extract abstract and compute a graph of related articles to build our photo encyclopedia. We use Elastic Map Reduce with custom Hadoop jobs and Pig scripts to analyze the Wikipedia content – it’s nice to be able to go from eight hours to less than two hours of processing time.

We’re also using on-demand instances and Hadoop to analyze our logs: all services logs are aggregated and archived into a S3 bucket, and we regularly analyze these to extract business metrics and user visible stats we then integrate into the site.

And there’s the secret sauce to bind this together: Chef. Chef is a young, and extremely powerful system integration framework. The Fotonauts team is working on a detailed blog post on “how we use chef” post in the future, because they consider Chef to be an essential component in our stack.

For instance, when we provision a new EC2 instance, we set up the instance with a simple boot script. On first boot, the instance automatically configures our ssh keys, installs some base packages (ruby, essentially) and registers itself in our DNS. Finally, Chef registers the instance into our Chef server. At this point we have a “generic”, passive machine added to our grid. Then we just associate a new role for this instance – let’s say we need a new backend for our main Rails application. At this point, it is just a matter of waiting for the instance to configure itself: installing rails, monitoring probes, doing a checkout of our source code and finally launch the application. A few minutes later, the machine running our load balancer and web cache notices a new backend and immediately reconfigures itself.

It would be interesting to see how they will benefit from the recent Boot-From-EBS feature that we added earlier.

What is great with this Amazon & Chef setup is that it helps you into thinking about your application globally. Running a complex application like Fotopedia is not just a matter of running some rails code and a MySQL database, but coordinating a long list of software services: some written by us, some installed as packages from the operating systems, some built and installed from source code (sometimes because it’s so recent it is not available in our linux distribution, sometimes because we need to patch the software for our needs). Automation is the rule, not the exception.

But putting aside the technical questions, our decision to base our infrastructure on Amazon Web Services led to several positive consequences on our process and workflow: less friction to experiment and prototype, an easy way to setup a testing and development platform, and more control over our production costs and requirements. We also recently migrated some instances to reserved instances billing.

I asked Olivier what’s next in their AWS Experiments and this is what he told me: “Amazon Relational Database Service.”

Thanks Olivier, and good luck with Fotopedia!

Simone Brunozzi (@simon on Twitter)
Technology Evangelist for AWS in Europe

Amazon Virtual Private Cloud Opens Up

by Jeff Barr | on | in Amazon EC2 |

I am happy to announce that the Amazon Virtual Private Cloud (VPC) is now available to all current and future Amazon EC2 customers. VPC users are charged only for VPN connection hours and for data transfer, making this a very cost-efficient way to create a secure and seamless bridge between a company’s existing IT infrastructure and the AWS cloud.

During the limited beta test, VPC users have seen that they can easily add a scalable, on-demand component to their infrastructure repertoire. They’ve used it to support a number of scenarios including development and testing, batch processing, and disaster recovery. We’re excited to be able to open up the VPC to the entire EC2 user base and look forward to hearing about even more usage scenarios.

We will enable all EC2 accounts for VPC today (this will take a couple of hours).

Start out by reading the VPC Technical Documentation, including the VPC Getting Started Guide, the VPC Network Administrator Guide, and the VPC Developer Guide.

— Jeff;

Amazon EC2 Spot Instances – And Now How Much Would You Pay?

by Jeff Barr | on | in Amazon EC2 |

We have a whole new way for you to request access to Amazon EC2 processing power!

Using our new Spot Instances, you can bid for one or more EC2 instances at the price you are willing to pay. Your Spot Instance request consists of a number of parameters including the maximum bid that you are willing to pay per hour, the EC2 Region where you need the instances, the number and type of instances you want to run, and the AMI that you want to launch if your bid is successful.

As requests come in and unused capacity becomes available, we’ll evaluate the open bids for each Region and compute a new Spot Price for each instance type. After that we’ll terminate any Spot Instances with bids below the Spot Price, and launch instances for requests with bids higher than or at the new Spot Price. The instances will be billed at the then-current Spot Price regardless of the actual bid, which can mean a substantial potential cost savings versus the bid amount.

You’ll be able to track changes to the Spot Price over time using the EC2 API or the AWS Management Console. This means that you can now create intelligent, value-based scheduling tools to get the most value from EC2. I’m really looking forward to seeing what kinds of tools and systems emerge in this space.

From an architectural point of view, because EC2 will terminate instances whose bid price becomes lower than the Spot Price, you’ll want to regularly checkpoint work in progress, perhaps using Amazon SimpleDB or an Elastic Block Store (EBS) volume. You could also architect your application so that it pulls work from an Amazon SQS Queue, counting on the SQS visibility timeout to return any unfinished work back to the queue if it is running on a Spot Instance that is terminated. Many types of work are suitable for this incremental, background processing model including web crawling, data analysis, and data transformation (e.g. media transcoding). It wouldn’t make much sense to run a highly available application such as a web server or a database on a Spot Instance, though.

You can use the Spot instances to make all sorts of time-vs-money-vs-value trade-offs. If you have some time sensitive work that is of high value, you can place a bid that’s somewhat higher than the historical Spot Price and know that there’s a higher likelihood that it will be fulfilled. If you have some time insensitive work, you can bid very low and have your work done when EC2 isn’t overly busy, perhaps during nighttime hours for that Region. The trick will be to use the price history to understand what pricing environment to expect during the time frame that you plan to make a request for instances.

Your requests can include a number of optional parameters for even more control:

  • You can specify that the request is one-time or persistent. A persistent request will be re-evaluated from time to time and is great for long-term background processing.
  • You can specify a date and time range when your request is valid.
  • You can request that all instances in your request be started at once, as a cluster that we call a Launch Group.
  • You can request that all of the instances come from a single Availability Zone. This may, of course, make it somewhat harder for us to fulfill your request.

Spot instances are supported by the EC2 API, the EC2 Command Line Tools, and the AWS Management Console. Here’s a picture of the AWS Management Console in action:

Here’s a good example of how the Spot Instances can be put to use.

The Protein Engineering group at Pfizer has been using AWS to model Antibody-Antigen interactions using a protein docking system. Their protocol utilizes a full stack of services including EC2, S3, SQS, SimpleDB and EC2 Spot instances (more info can be found in a recent article by BioTeam‘s Adam Kraut, a primary contributor to the implementation). BioTeam described this system as follows:

The most computationally intensive aspect of the protocol is an all-atom refinement of the docked complex resulting in more accurate models. This exploration of the solution space can require thousands of EC2 instances for several hours.

Here’s what they do:

We have modified our pipeline to submit “must do” refinement jobs on standard EC2 instances and “nice to do” workloads to the Spot Instances. With large numbers of standard instances we want to optimize the time to complete the job. With the addition of Spot Instances to our infrastructure we can optimize for the price to complete jobs and cluster the results that we get back from spot. Not unlike volunteer computing efforts such as Rosetta@Home, we load the queue with tasks and then make decisions after we get back enough work units from the spot instances. If we’re too low on the Spot bids we just explore less solution space. The more Spot Instances we acquire the more of the energy landscape we can explore.

Here is their architecture:

You can learn even more about Spot Instances by reading Werner Vogels’, Expanding the Cloud – Amazon EC2 Spot Instances, Thorsten von Eiken’s Bid for Your Instances and the Introduction to Spot Instances.

So, what do you think? Is this cool, or what?

— Jeff;

Amazon EC2 Running Microsoft Windows Server 2008

by Jeff Barr | on | in Amazon EC2 |

You can now run Microsoft Windows Server 2008, SQL Server 2008 Express, and SQL Server Standard 2008 on Amazon EC2. There has been a lot of demand for this particular feature and I’m happy to be able to make this announcement!

You can launch these instances in all three AWS Regions (US East, US West, and EU) right now, and you can also take advantage of additional EC2 features such as Elastic IP Addresses, the Amazon Elastic Block Store, Amazon CloudWatch, Elastic Load Balancing, and Auto Scaling.

You can use the entire Microsoft Web Platform, including ASP.NET, ASP.NET Ajax, Silverlight, and Internet Information Server (IIS) and you can also use the AWS SDK for .NET to access other parts of AWS such as Amazon S3, the Amazon Simple Queue Service, or Amazon SimpleDB.

Pricing starts at $0.12 per hour for Windows Server 2008 and $1.08 per hour for SQL Server Standard Edition. There’s more information on the Amazon EC2 Running Windows page.

I think that this new release highlights a key aspect of AWS — flexibility. As your needs dictate, you can launch EC2 instances running an incredibly diverse array of operating systems including Windows Server 2003 and 2008, seven distinct Linux distributions (Fedora, CentOS, Debian, Gentoo, Red Hat, SUSE, and Ubuntu), and even OpenSolaris. You can launch a few (or a bunch of) instances in three separate geographic Regions, and you can keep them running for as long as you need them.

This additional flexibility means that you can use EC2 to create heterogeneous application architectures, using the operating system that is best suited to each part of the system. You can do your web crawling on some Linux instances, transcode the data on a Windows instance or two, and then serve up the final results using a web server running on another Linux instance.

Windows Server 2008 makes use of our new Boot from EBS feature, so your root partition can occupy  up to 1 TB. You can stop, and then later start the instances with ease, all from the AWS Management Console.

How do you plan to use Windows Server 2008? Leave me a comment!

— Jeff;   

The Economics of AWS

by Jeff Barr | on | in Amazon EC2 |

For the past several years, many people have claimed that cloud computing can reduce a company’s costs, improve cash flow, reduce risks, and maximize revenue opportunities. Until now, prospective customers have had to do a lot of leg work to compare the costs of a flexible solution based on cloud computing to a more traditional static model. Doing a genuine “apples to apples” comparison turns out to be complex it is easy to neglect internal costs which are hidden away as “overhead”.

We want to make sure that anyone evaluating the economics of AWS has the tools and information needed to do an accurate and thorough job. To that end, today we released a pair of white papers and an Amazon EC2 Cost Comparison Calculator spreadsheet as part of our brand new AWS Economics Center. This center will contain the resources that developers and financial decision makers need in order to make an informed choice. We have had many in-depth conversations with CIO’s, IT Directors, and other IT staff, and most of them have told us that their infrastructure costs are structured in a unique way and difficult to understand. Performing a truly accurate analysis will still require deep, thoughtful analysis of an enterprise’s costs, but we hope that the resources and tools below will provide a good springboard for that investigation.

Here’s what we are releasing:


Whitepaper The Economics of the AWS Cloud vs. Owned IT Infrastructure. This paper identifies the direct and indirect costs of running a data center. Direct costs include the level of asset utilization, hardware costs, power efficiency, redundancy overhead, security, supply chain management, and personnel. Indirect factors include the opportunity cost of building and running high-availability infrastructure instead of focusing on core businesses, achieving high reliability, and access to capital needed to build, extend, and replace IT infrastructure.

 

Ec2costcalc The Amazon EC2 Cost Comparison Calculator is a rich Excel spreadsheet that serves as a starting point for your own analysis. Designed to allow for detailed, fact-based comparison of the relative costs of hosting on Amazon EC2, hosting on dedicated in-house hardware, or hosting at a co-location facility, this detailed spreadsheet will help you to identify the major costs associated with each option. We’ve supplied the spreadsheet because we suspect many of our customers will want to customize the tool for their own use and the unique aspects of their own business.

 

UserguideThe second white paper is a User Guide for the Amazon EC2 Cost Comparison Calculator. This document is intended for use by financial decision makers as a companion to the calculator. The document and the associated calculator identifies and focuses on the direct costs of IT infrastructure and skips the indirect costs, which are far more difficult to quantify.

All these resources and tools are available in our AWS Economics Center. As always, feedback is always appreciated.

— Jeff;

Gear6 Web Cache Server for the Cloud

by Jeff Barr | on | in Amazon EC2, Developer Tools |

Many Web 2.0 applications include a substantial amount of dynamic content. The pages of these applications generally cannot be generated once and then saved for reuse, and must be built from scratch in response to each request.

In order to make a Web 2.0 application run with an acceptable degree of efficiency, it is often necessary to do some application-level caching. The open source Memcached server is often used for this purpose. It is relatively easy to install Memcached on one or more servers, creating a single, cache which can expand to consume the available RAM on all of the servers if necessary. The cache can be checked before performing an expensive calculation or database lookup, often obviating the need for the calculation or the lookup. The results can be stored in the cache for the next time around. Properly implemented, a cache can provide a tremendous speed benefit while also reducing traffic to database and compute servers.

Gear6 has created a version of Memcached suitable for mission-critical work. This product is now available as an Amazon EC2 AMI.

Think of it as “Memcached as a service.”

The Gear6 implementation of Memcached leverages the instance’s local (ephemeral) storage, providing a 100x cache capacity increase per instance (when compared to a purely RAM-based cache), while remaining 100% compatible with the existing memcached API. There’s a web UI (shown at right) for monitoring, with access to 24 hours of historical usage and performance data.

The 32-bit (Small) instances are free (regular EC2 usage charges apply) and the 64-bit instances are available on an hourly basis, with prices ranging from $0.50 (Large) to $0.86 (Quad Extra Large) per hour, plus EC2 usage charges, including 24×7 support from Gear6. Get started now.

Memcached client libraries are available for every common programming language, including C / C++, PHP, Java, Python, Ruby, Perl, .Net, MySQL, PostgresQL, Erlang, Lua, LISP, and ColdFusion.

I’m really happy to see this offering from Gear6. As I note in this blog from time to time, powerful, high-level services like this allow application developers to spend more time focusing on the novel and value-added aspects of their application and less time on the underlying infrastructure.

— Jeff;

Expanding the AWS Footprint

by Jeff Barr | on | in Amazon CloudFront, Amazon CloudWatch, Amazon EC2, Amazon Elastic Load Balancer, Amazon Elastic MapReduce, Amazon SDB, Amazon SQS, Auto Scaling |

A new AWS Region is online and available for use!

Our new Northern California (US-West) Region supports Amazon EC2, Amazon S3, SimpleDB, SQS, Elastic MapReduce, Amazon CloudWatch, Auto Scaling, and Elastic Load Balancing. The AWS documentation contains the new endpoints for each service.

The existing Amazon S3 US Standard Region provides good, cost-effective performance for requests originating from anywhere in the country. The new S3 region optimizes performance for requests originating from California and the Southwestern United States. In either case, Amazon CloudFront can be used to provide low-latency global access to content stored in S3. The Northern California and EU Regions provide read-after-write consistency for PUTs of new objects in your Amazon S3 bucket and eventual consistency for overwrite PUTs and for DELETEs. The existing US Standard Region continues to provide eventual consistency.

Service pricing in the Northern California Region will be at a small premium to the pricing in our existing US-based Region, reflecting a difference in our operating costs.

 

As you can see from the screen shot at right, the newest release of Bucket Explorer provides support for the new region. Other tools will add support in the near future. If you are a tool vendor, please post a comment as soon as you’ve added support for the new region.

 

Update: The CloudBerry S3 Explorer now supports the Northern California Region. It also includes a preview version of the code needed to support AWS Import/Export.

 

You can get started with this Region now. Existing code and tools should require only a change of service endpoint to work. The AWS Management Console and ElasticFox already support the new Region.

As you may already know, we have already announced that we plan to bring AWS to Asia in 2010, starting out with multiple Availability Zones in Singapore in the first half of 2010, followed by other Asian locations later in the year.

Update: The AWS Simple Monthly Calculator has also been updated and now shows the new region in the drop down box. You can use to the calculator to estimate your costs.

— Jeff;