Category: Amazon EC2

New Public Data Set: Wikipedia XML Data

by Jeff Barr | on | in Amazon EC2 |

Weighing in at a whopping 500 GB (388 GB of data and 112 GB of free space to allow for some in-place decompression), the Wikipedia XML data is our newest Public Data Set.

This data set contains all of the Wikimedia wikis in the form of wikitext source and metadata embedded in XML. We’ll be updating this data set every month and we’ll keep the sets for the previous three months around.

As you can see from this screen shot of my PuTTY window, there are some pretty beefy files in this data set:

As an example of what can be done with this data, take a look at Cloudera’s blog post on Grouping Related Trends with Hadoop and Hive. This article shows how to create a trend tracking site using a Cloudera Hadoop cluster running on EC2, using Apache Hive queries to process the data.

— Jeff;

New Public Data Set: Daily Global Weather

by Jeff Barr | on | in Amazon EC2 |

The folks at Infochimps have just released the Daily Global Weather Public Data Set.

This 20 GB data set incorporates daily weather measurements (temperature, dew point, wind speed, humidity, barometric pressure, and so forth) from over 9000 weather stations around the world. The data was originally collected as part of the Global Surface Summary of the Day (GSOD) by the National Climactic Data Center and is available from 1929 to the present, with the data from 1973 to the present being the most complete.

The map at right contains one yellow dot for each data collection station.

— Jeff;

New Public Data Set: Sloan Digital Sky Survey DR6 Subset

by Jeff Barr | on | in Amazon EC2 |

The Sloan Digital Sky Survey, or SDSS, is now available as a Public Data Set.

Weighing in at 180 GB, the SDSS is the most ambitious astronomical survey ever undertaken. The researchers have used a 2.5 meter, 120 megapixel telescope located in Apache Point, New Mexico to capture images of over one quarter of the sky, or about 230 million celestial objects. They have also created 3-dimensional maps containing more than 930,000 galaxies and 120,000 quasars.

This new public data set (which is a subset of the entire SDSS) will be of interest to students, educators, hobby astronomers, and researchers. From a standing start, it is possible to launch an EC2 instance, create an Elastic Block Store volume with this data, attach the volume to the instance and start examining and processing the data in less than ten minutes.

The data set takes the form of a Microsoft SQL Server MDF file. Once you have created your EBS volume and attached it to your Windows EC2 instance, you can access the data using SQL Server Enterprise Manager or SQL Server Management Studio. The SDSS makes use of stored procedures, user defined functions, and a spatial indexing library, so porting it to another database would be a fairly complex undertaking.

I know from experience (my son Andy is studying Astronomy at the University of Washington and is always showing me the “please delete your unnecessary files” emails from the department’s administrator) that storage space is always at a premium in academic settings, due in part to the existence of large scale data sets like this. The combination of EC2, EBS, this public data set, and our AWS in Education program should enable students and educators to analyze, process, display, and study the universe in revolutionary ways.

— Jeff;

Shared Snapshots for EC2’s Elastic Block Store Volumes

by Jeff Barr | on | in Amazon EC2 |

Today we are adding a new feature which significantly improves the flexibility of EC2’s Elastic Block Store  (EBS) snapshot facility. You now have the ability to share your snapshots with other EC2 customers using a new set of fine-grained access controls. You can keep the snapshot to yourself (the default), share it with a list of EC2 customers, or share it publicly. Here’s a visual overview of the data flow (in this diagram, the word Partner refers to anyone that you choose to share your data with):

The Amazon Elastic Block Store lets you create block storage volumes in sizes ranging from 1 GB to 1 TB. You can create empty volumes or you can pre-populate them using one of our Public Data Sets. Once created, you attach each volume to an EC2 instance and then reference it like any other file system. The new volumes are ready in seconds. Last week I created a 180 GB volume from a Public Data Set, attached  it to my instance, and started examining it, all in about 15 seconds.

You can use the AWS Management Console, the command line tools, or the EC2 API to create a snapshot backup of an EBS volume at any time. The snapshots are stored in Amazon S3. Once created, a snapshot can be used to create a new EBS volume in the same AWS region. Sharing these snapshots, as we are now letting you do, makes it possible for other users to create an identical copy of the volume.

The new ModifySnapshotAttribute function gives you the ability to set and change the createPermission attribute on any of your snapshots. We’ve also added the ResetSnapshotAttribute function to clear snapshot attributes and the DescribeSnapshotAttribute function to get the value of a particular attribute.

The DescribeSnapshots function now lists all of the snapshots that have been shared with you. You can also use this function to retrieve a list of all of our Public Data Sets.

You can also modify snapshot permissions using the AWS Management Console:

How can you use this? Off the top of my head, here are a number of ideas:

  1. If you are a teacher or professor, create and share a volume of reference data for use in a classroom setting (and take a look at the AWS in Education program too).
  2. If you are a researcher, share your data and your results with your colleagues, both within your own organization and at other organizations.
  3. If you are a developer, share your development and test environments with your teammates. Snapshot the environments before each release to make it easy to regenerate the environment later for regression tests.
  4. If you are a business, you can use snapshots to store data internally, with external clients, or with partners. This could be reference data, results of a lengthy and expensive computation, a set of test cases (and expected results) or even a set of pre-populated database tables.

I’m sure you have some ideas of your own; please feel free to share them in a comment!

Update:Shlomo Swidler posted some really good ideas in his Cloud Developer Tips blog.

As is often the case with AWS, we’ll use this new feature as the basis for even more functionality later.

— Jeff;

Now In Europe: Amazon SimpleDB, CloudWatch, Auto Scaling, and Elastic Load Balancing

by Jeff Barr | on | in Amazon CloudWatch, Amazon EC2, Amazon Elastic Load Balancer, Amazon SDB, Auto Scaling |

I’m happy to announce that the following AWS services are now available in Europe:

  • Amazon SimpleDB – Highly available and scalable, low/no administration structured data storage.
  • Amazon CloudWatch – Monitoring for the AWS cloud, starting with providing resource consumption (CPU utilization, network traffic, and disk I/O) for EC2 instances.
  • Elastic Load Balancing – Traffic distribution across multiple EC2 instances.
  • Auto Scaling – Automated scaling of EC2 instances based on rules that you define.

All of the services work just the same way in Europe as they do in the US. Existing applications and management tools should be able to access the services in this region after a simple change of the service endpoint. As is the case with S3 and EC2, these services are independent of their US counterparts.

Our full slate of infrastructure services is now available in Europe. With the European debut of these services, developers can now built reliable and scalable applications in both of the AWS regions (US and Europe).

— Jeff;




AWS Management Console – Now with Amazon CloudWatch Support

by Jeff Barr | on | in Amazon CloudWatch, Amazon EC2 |

The AWS Management Console now has complete support for Amazon CloudWatch. You can enable CloudWatch for any or all of your EC2 instances using the console and data will be available in a moment or two. You can select one or more running EC2 instances to see the CloudWatch data in graphical form. You can observe CPU utilization, disk reads, disk writes, and network traffic (both in and out). If you select more than one EC2 instance, the console will automatically display aggregated values.You can also get a larger and more detailed view of the data.

Here are some pictures of the console in action:

Among other uses, you can use the new CloudWatch support to monitor and tune your Auto Scaling rules.

The new release of the AWS Management Console also centralizes a number of actions on EC2 instances in a new Instance Actions menu:

It is flexible, colorful, and informative and you can start to use it now!

— Jeff;

Introducing Amazon Virtual Private Cloud (VPC)

by Jeff Barr | on | in Amazon EC2, Announcements |

Amazon Virtual Private Cloud (Amazon VPC) lets you create your own logically isolated set of Amazon EC2 instances and connect it to your existing network using an IPsec VPN connection. This new offering lets you take advantage of the low cost and flexibility of AWS while leveraging the investment you have already made in your IT infrastructure.

This cool new service is now in a limited beta and you can apply for admission here.

Heres all you need to do to get started:

  1. Create a VPC. You define your VPCs private IP address space, which can range from a /28 (16 IPs) up to a /18 (16,384 IPs). You can use any IPv4 address range, including Private Address Spaces identified in RFC 1918 and any other routable IP address block.
  2. Partition your VPCs IP address space into one or more subnets. Multiple subnets in a VPC are arranged in a star topology and enable you to create logically isolated collections of instances. You can create up to 20 Subnets per VPC (you can request more using this form). You can also use this form to request a VPC larger than a /18 or additional EC2 instances for use within your VPC.
  3. Create a customer gateway to represent the device (typically a router or a software VPN appliance) anchoring the VPN connection from your network.
  4. Create a VPN gateway to represent the AWS end of the VPN connection.
  5. Attach the VPN gateway to your VPC.
  6. Create a VPN connection between the VPN gateway and the customer gateway.
  7. Launch EC2 instances within your VPC using an enhanced form of the Amazon EC2 RunInstances API call or the ec2-run-instances command to specify the VPC and the desired subnet.

Once you have done this, all Internet-bound traffic generated by your Amazon EC2 instances within your VPC routes across the VPN connection, where it wends its way through your outbound firewall and any other network security devices under your control before exiting from your network.

IP addresses are specified using CIDR notation, where the value after the slash represents the number of bits in the routing prefix for the address. Youre currently limited to one VPC per AWS account, however, if you have a use case requiring more, let us know and well see what we can do.

Because the VPC subnets are used to isolate logically distinct functionality, weve chosen not to immediately support Amazon EC2 security groups. You can launch your own AMIs and most public AMIs, including Microsoft Windows AMIs. You cant launch Amazon DevPay AMIs just yet, though.

The Amazon EC2 instances are on your network. They can access or be accessed by other systems on the network as if they were local. As far as you are concerned, the EC2 instances are additional local network resources — there is no NAT translation. EC2 instances within a VPC do not currently have Internet-facing IP addresses.

Requirements to interoperate with our VPN implementation include:

  • Ability to establish IKE Security Association using Pre-Shared Keys (RFC 2409).
  • Ability to establish IPSec Security Associations in Tunnel mode (RFC 4301).
  • Ability to utilize the AES 128-bit encryption function (RFC 3602).
  • Ability to utilize the SHA-1 hashing function (RFC 2404).
  • Ability to utilize Diffie-Hellman Perfect Forward Secrecy in Group 2 mode (RFC 2409).
  • Ability to establish Border Gateway Protocol (BGP) peerings (RFC 4271).
  • Ability to utilize IPSec Dead Peer Detection (RFC 3706).

Optional capabilities that we recommend include:

  • Ability to adjust the Maximum Segment Size of TCP packets entering the VPN tunnel (RFC 4459).
  • Ability to reset the Dont Fragment flag on packets (RFC 791).
  • Ability to fragment IP packets prior to encryption (RFC 4459).

Weve confirmed that a variety of Cisco and Juniper hardware/software VPN configurations are compatible; devices meeting our requirements as outlined in the box at right should be compatible too. We also plan to support Software VPNs in the near future. If you want us to consider explicitly validating a device not on this list, please add your request to the Customer Gateway support thread located here.

Amazon VPC functionality is accessible via the EC2 API and command-line tools. The ec2-create-vpc command creates a VPC and the ec2-describe-vpcs command lists your collection of VPCs. There are commands to create subnets, customer gateways, VPN gateways, and VPN connections. Once all of the requisite objects have been created, the ec2-attach-vpn-gateway connects your VPC to your network and allows traffic to flow. While most organizations will likely leave the VPN connection (and VPC) up and running indefinitely, you can drop the connection, terminate the instances, and even delete the VPC if you would like.

You only pay for what you use. Pricing is on a pay-as-you-go basis. VPCs, subnets, customer gateways, and VPN gateways are free to create and to use. You simply pay an hourly charge for each VPN connection you create, and for the data transferred through those VPN connections. EC2 instances within your VPC are priced at the normal On-Demand rate. Well honor the hourly rate for any Reserved Instances that you have but during the beta we cannot guarantee that Reserved Instances will always be available for deployment within your VPC.

Imagine the many ways that you can now combine your existing on-premise static resources with dynamic resources from the Amazon VPC. You can expand your corporate network on a permanent or temporary basis. You can get resources for short-term experiments and then leave the instances running if the experiment succeeds. You can establish instances for use as part of a DR (Disaster Recovery) effort. You can even test new applications, systems, and middleware components without disturbing your existing versions.

As is the case with many of our betas, this one is launching in a single Availability Zone in the US-East region. You can use Amazon CloudWatch to monitor your instances, but you cant use Elastic IP addresses, Auto Scaling or Elastic Load Balancing just yet.

Recall that all traffic from your instances routes through the VPN connection. For now, this includes traffic to other Amazon Web Services such as EC2 instances outside of your Amazon VPC, Amazon S3, Amazon SQS, and Amazon SimpleDB. You can create Elastic Block Store (EBS) volumes and attach them to your instances. EBS volumes created within your cloud can be moved to standard EC2 instances and vice-versa.

I do want to mention a few of the things on our road map as well. First, we’re planning to let you directly reach the Internet from your VPC. In early discussions with potential users, we learned that most of them wanted to completely isolate their EC2 instances, routing all of the traffic back to their data center, so we gave this feature the highest priority. Later on, we’ll let you decide if and how you want to expose your VPC to the Internet. Second, we’re planning to let you specify the IP address of individual Amazon EC2 instances within a subnet. During this beta, Amazon EC2 instances are automatically assigned a random IP from the subnet’s designated IP address range. Third, we’re evaluating ways to allow you to filter traffic per subnet, kind of like how you might implement router ACLs. We’re already working on these items and on other additions to the core functionality we’re releasing today. If you have opinions on these items, or anything else you’d like to see in the service, e-mail us or post to the forum. This service is for you; we really need your feedback!

We think you can put Amazon VPC to immediate use and cant wait to hear about new and imaginative use cases for it. Please feel free to leave a comment on this blog or to send us some email.

— Jeff;

Lower Pricing for Amazon EC2 Reserved Instances

by Jeff Barr | on | in Amazon EC2, Price Reduction |

Our customers are putting the Amazon EC2 Reserved Instances to use in many different ways. Here are some of the usage patterns that they’ve told us about:

  • Steady State Usage – These customers have applications which require a fixed number of servers to be available at all times. Reserved Instances are advantageous for customers who are currently using their own hardware or who are using On-Demand instances full-time.
  • Low to Medium Annual Utilization – These customers have applications which run less than 100% of the time. The breakeven point can be calculated based on anticipated instance usage at the effective hourly rate. Reserved Instances offer a cost savings over On-Demand instances even at relatively low utilization rates.
  • Variable Usage – These customers have applications with unpredictable or fluctuating usage patterns. They can use a combination of Reserved and On-Demand instances to minimize their net costs. This is especially valuable when EC2 instances are frequently launched and then terminated—we minimize costs by always charging the lowest applicable price for each instance.
  • Standby Capacity – These customers use Reserved Instances as a reliable source of standby capacity with availability at a moment’s notice. The Reserved Instances are an integral part of their disaster recovery plan.

Given the many ways that our customers have already put them to use, I am happy to tell you that we’ve lowered the prices for newly purchased Amazon EC2 Reserved Instances! On a three year term, you can now get an m1.small instance for an effectively hourly rate of just $0.043 per hour (4.3 cents). The new pricing is now in effect.

Here are the new US prices (the instance prices for the EU have also been reduced):

Three Year Term
Instance Type Instance Price Hourly Charge Effective Hourly Rate*
m1.xlarge $2800.00 $0.24 $0.347
m1.large $1400.00 $0.12 $0.174
m1.small $350.00 $0.03 $0.043
c1.xlarge $2800.00 $0.24 $0.347
c1.medium $700.00 $0.06 $0.087
One Year Term
Instance Type Instance Price Hourly Charge Effective Hourly Rate*
m1.xlarge $1820.00 $0.24 $0.448
m1.large $910.00 $0.12 $0.224
m1.small $227.50 $0.03 $0.056
c1.xlarge $1820.00 $0.24 $0.448
c1.medium $455.00 $0.06 $0.112

You can purchase Reserved Instances from the AWS Management Console:

Or through ElasticFox:

— Jeff;

* – The Effective Hourly Rate is computed based on full-time (24×7) usage.

Cirrhus 9 Qualified Machine Image Webinar – August 12th

by Jeff Barr | on | in Amazon EC2 |

Mike from Cirrhus9 wrote to let me know that they’ll be conducting a webinar on August 12th to discuss their new Qualified Machine Image (QMI) for the Life Science and Pharmaceutical industries. The QMIs are Amazon EC2 AMIs with complete installation and operation documentation.

Currently in beta testing, the QMI is designed to help organizations meet the FDA’s 21 CFR Part 11 requirements for validation. 

The webinar is free but registration is a must.

— Jeff;

What Should Adam Do?

by Jeff Barr | on | in Amazon EC2 |

Its always interesting to see what people use Amazon Web Services for. This blog post is on one hand a look at an interesting example; however it is also a chance to participate in a social experiment. Adam Ginsburg from Sydney Australia set up a form to let you vote on what he should do next.

There a number of other things going on here. First of all, Adam made a YouTube video that shows how easy it is to set up Lotus Forms Turbo on Amazon EC2. That makes sense, because Adam works for IBM, and I work for AWS. So of course we were on a call about IBM software that runs as Amazon Machine Images (AMIs). We are both really excited about what on-demand pricing for IBMs server lineup offers in terms of new opportunities for both System Integrators and enterprises that want to innovate.

The conversation eventually got around to Adams experiment. He was able to set up a temporary server for just this event, and in a day or two the server can be torn down with no residual financial effects. Thats one of the beauties of AWS.

Adam posted the experiment on Twitter, and the interesting thing is that as of this moment “do nothing” is winning in the votes, and Adam swears that he is not cooking the results. (Perhaps I just did, though). You have one day to vote dont waste your chance to demonstrate the power of AWS…ummm…voting.

Update: Adam finished the experiment and took down the temporary server. You can view the results at