Category: Amazon S3


Adding the Export to AWS Import/Export

I blogged about the new AWS Import/Export feature this past spring and told you how it allows you to load any amount of data into Amazon S3 by simply shipping the data to us on a compatible storage device. The response to that announcement has been excellent and our customers are now sending us terabytes of data every week.

Today I’d like to tell you about the new Export aspect of this feature. Using a workflow similar to the one you’d use to import data, you prepare a MANIFEST file, email it to us, receive a job identifier in return, and then send us one or more specially prepared storage devices. We’ll take the devices, verify them against your manifest file, copy the data from one or more S3 buckets to your device(s) and ship them back to you.

A “specially prepared” storage device contains a SIGNATURE file. The file uniquely identifies the Import/Export job and also authenticates your request.

You can use the new CREATE EXPORT PLAN email command to simplify the process of exporting a data set that won’t fit on a single storage device. Given the block size, and the device capacity (either formatted or unformatted), the command returns a link to a zip file with a set of MANIFEST files inside. 

You will be charged a fixed fee of $80.00 per device and $2.49 per hour for the time spent copying the data to your device. Normal charges for S3 requests also apply. There is no charge for bandwidth.

There are many uses for this new feature. Here are some ideas to get you started:

  • Disaster Recovery – If your local storage fails or is destroyed, use the Export feature to retrieve your precious data.
  • Data Retrieval – After creating a large data set (either by gathering it up or by computing it) in the cloud, use the Export feature to get a local copy.
  • Data Distribution – Take a large data set, sell copies, and use the Export feature to take care of the distribution.
  • Data Processing – Use the Import feature to load a large data set (yours or a customers’) into the cloud, do some computationally intensive processing (e.g. de-duplication), and then get the data back using the Export feature.

Sign up here to get started with AWS Import/Export.

What are your ideas? How can you use this new feature? Leave a comment if you’d like!

— Jeff;

PS – We haven’t forgotten our international users! Were working on a number of solutions to enable international shipments to and from the US, and to enable support in the EU region.

AWS Import/Export: Ship Us That Disk!

.notyet { background-color: yellow; } .quote20090507 { width: 80%; border: 1px dotted black; padding: 8px; margin-left: 60px; margin-bottom: 10px; } .list20090507 li { padding-bottom: 8px; }

Many years ago, professor Andy Tanenbaum wrote the following:

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

Since station wagons and tapes are both on the verge of obsolescence, others have updated this nugget of wisdom to reference DVDs and Boeing 747s.

Hard drives are getting bigger more rapidly than internet connections are getting faster. It is now relatively easy to create a collection of data so large that it cannot be uploaded to offsite storage (e.g. Amazon S3) in a reasonable amount of time. Media files, corporate backups, data collected from scientific experiments, and potential AWS Public Data Sets are now at this point. Our customers in the scientific space routinely create terabyte data sets from individual experiments.

This isn’t an issue that can be solved by getting a faster connection; even at the highest reasonable speed, some of these data sets would take weeks or months to upload. For example, it would take over 80 days to upload just 1TB of data over a T1 connection.

Customers with AWS storage requirements at the terabyte and petabyte level often ask us if they can sidestep the internet and simply send us a disk drive, or even a 747 full of such drives.

I can now say “Yes, you can!” Our new AWS Import/Export service allows you to ship your data to us. This service is now in a limited beta and you can sign up here. We’ll take your storage device, load the data into a designated S3 bucket, and send your hardware back to you. The data load takes place in a secure facility with a high bandwidth, low-latency connection to Amazon S3. Once the data has been loaded in to S3, you can process it on EC2, and then store the results anywhere you would like — back into S3, in SimpleDB, or on EBS volumes.

During the limited beta we are set up to accept devices with a USB 2.0 or eSATA connector, formatted as an FAT32, ext2, ext3, or NTFS file system. We are set up to handle devices that weigh less than 50 pounds and fit within an 8U rack. We are also happy to make special arrangements to accommodate larger and heavier devices. Last week after a conference talk one of the attendees asked me “Can we ship you our SAN?” In theory, yes, but we’d need to discuss the specifics beforehand.

It is easy to initiate a data load. Here’s what you do:

  1. Load the data onto your compatible storage device.
  2. Create a manifest file per our specification. The file must include the name of the target S3 bucket, your AWS Access Key, and a return shipping address. You can also specify content types and S3 Access Control Lists in this file. You can also use the newest versions of third-party tools such as Bucket Explorer and S3 Fox to easily create manifest files from your S3 buckets.
  3. Email the manifest file to a designated address with the subject CREATE JOB.
  4. Await a return email with the subject RE: CREATE JOB.
  5. Extract the JOBID value from the email.
  6. Use the JOBID, manifest file, and your AWS Secret Access Key to generate and sign a signature file in the root directory of your storage device.
  7. Ship your storage device, along with all necessary power and data cables, to an address that we’ll provide to you. You can use one of the usual shipping company or a courier service.
  8. Await further status emails.

Once we get the device we’ll transport it to our data center and initiate the load process by the end of the next business day. A log file is created as part of the process; it will include the data and time of the load, and the S3 key, MD5 checksum, and size (in bytes) of each object. We’ll reject unreadable files and those larger than 5GB and note them in the log. At the end of the process we’ll send the device back to you at our expense.

In keeping with our model of charging for resources only as they are consumed, you will pay a fixed fee per device and a variable fee for each hour of data loading. There’s no charge for data transfer between AWS Import/Export and an S3 bucket in the United States. Normal S3 Request and Storage charges apply.

Right now packages must be shipped from and returned to addresses in the United States. We do expect to be able to accept packages at a location in Europe in the near future.

Let’s talk about security for a minute. You can choose to encrypt your files before you send them to us, although we don’t support encrypted file systems. We track custody of your device from the time it arrives in our mailroom until it is shipped back to you. All personnel involved in the process have undergone extensive background checks.

Also, as you can probably guess from the name of the service, we have plans to let you transfer large amounts of data out of AWS as well. We will provide further information as soon as possible.

Here are some preliminary screen shots of the AWS Import/Export support in Bucket Explorer and S3 Fox:

We have also created an Import/Export Calculator:

If you have a large amount of data stored locally and you want to get it into Amazon S3 on a cost-effective and timely basis, you should definitely sign up for the beta now. You can read more about AWS Import/Export in the Developer Guide.

— Jeff;

Celebrating S3’s Third Birthday With Special Anniversary Pricing

Amazon S3 is now three years old and busier than ever. Just a year ago, there were 18 billion objects in S3. As of today there are 52 billion, a near three-fold increase.

To celebrate S3’s third birthday, we have some special pricing for you. From now until the end of June 2009, uploads to S3 will be charged at the promotional price of just $0.030 per Gigabyte (details here).

— Jeff;

Links for March 9th

.hr20090309 { color: orange; background-color: orange; height: 4px; border: 0; }

Catching up on a variety of interesting topics today. Some of these have been lingering in my inbox for quite a while — others are hot off the wire. In every case though, the result of innovation continues to amaze and impress!

Today I am writing about An Amazon S3 Publishing Plugin for Expression Encoder, a Benchmark Testing on EBS Performance in various configurations, Help Wanted: Programmer for Actionscript & Flash, and an update on Metropix–which I originally blogged about way back in 2007.

 

Amazon S3 Publishing Plugin for Expression Encoder

Tim Heuer over at Microsoft emailed me about this a while back. Apologies to Tim for not blogging sooner, but on the other hand he told me just before I got distracted by Maui :)

Quoting from Tim’s blog post

Ive been using Amazons S3 web services for a while and have really grown to like it a lot. One of the Live Writer extensions I spoke of earlier is a plugin for S3 for Live Writer that Aaron Lerch helped out with as well! I though I should extend Encoder so that Id have a one click publishing point to my S3 account instead of having to use S3Fox all the time (which is an awesome tool btw).

So after getting home from a user group I started cranking one out, figuring out the nuances and just coding something together. A few hours later I came up with what Im calling 1.0 beta of my plugin.

Its not a fancy UI, but it doesnt need to be, it serves a purpose: enable publishing of Encoder output directly to an Amazon S3 bucket in one click. Thats it. Encoding just media? No problem. Adding a template? Not a problem either. You simply need to enter your Amazon S3 account information and enter a bucket. If the bucket isnt there, it will attempt to create it. You can also list your current buckets if you forgot them.

The plugin can be downloaded from Codeplex by clicking here.

 

Benchmark Testing

AF Design wrote a blog post after they analyzed EC2 disk performance. This was a completely independent test that appeared on the Net without any advance contact between the author and Amazon Web Services — at least, not as far as I am aware of.

I found it very interesting that testing was thorough, and not limited to just one or two tests. And the test experiments with quite a few RAID configurations, as you can see in the chart.

As you probably know, benchmarking is definitely a black art, and is made even more complex in a virtualized environment, so your results may vary. We generally advise prospective customers to do benchmarks using their own code and workloads.

From the post…

First I wanted to determine what the EBS devices would compare to in the physical world. I ran Bonnie against a few entry level boxes provided by a number of ISPs and found the performance roughly matched a locally attached SATA or SCSI drive when formatted with EXT3. I also found that JFS, XFS and ReiserFS performed slightly better than EXT3 in most tests except block writes.

 

Help Wanted: Programmer for Actionscript & Flash

UpperSports.com emailed me to ask if there is anyone able to help out with some development work. “To complete the audio functionality of our online editor, we are in need of a programmer who is familiar with Actionscript 3, Flash and Flash Media Server (Wowza preferably). It would be advantageous if they are also familiar with Ruby on Rails and Amazon Web Services (EC2, S3, SQS).”

You can email them from their website.

 

Metropix Update

Way back in 2007 I blogged about Metropix, which makes 3D models automatically from floor plans using Amazon AWS.

Max Christian emailed me a while ago with an update…

I thought I’d write to let you know that Metropix has just been acquired by the Daily Mail and General Trust, which is a $2bn publishing company based over here in London. (The same company owns Primelocation and FindaProperty which are major real estate property portals.)

It’s only four years since we started the business on the not-so grand sum of 16,000, so this is a really exciting moment for us! Without AWS, we simply wouldn’t have been able to do what we did without taking on external investors. In fact, I just checked our accounts and unbelievably the total we’ve spent on AWS since launch is just 1826.37, which is absolutely astounding value for money given the pivotal role AWS has played in our rapid expansion. We’re working hard now on taking advantage of EC2’s new Windows support ready for further expansion as part of a bigger group.

 

That’s it for this post. Enjoy!!

— Mike

A True Bollywood Tale

One of my favorite parts of my job is meeting people and watching them innovate with Cloud technology. So I want to tell you about a recent weekend trip to the Bay Area.

Last weekend I was in Mountain View, CA to attend OpenSocials WeekendApps event. This was held at the Googleplex, and was designed for people to build social networking applications in a weekend. The event began Friday evening with proposals for various projects, and teams were formed around the most popular ones. Then it was time to work non-stop until Sunday, when applications were judged and a winner was picked. The ground rules were that applications had to use the OpenSocial framework, had to be completed and in production by Sunday.

It was quite an experience even as an observer! To begin with, Twitter was humming with updates from all corners, while others streamed the event on the Net, and still others created drawings or documented the action with photos.

One of the participants summed the weekend up as follows:
Day 1. Wow, Im at Google.
Day 2. Wow, Im at Google.
Day 3. Wow, Im tired!

Sunday evenings winner was a really cool Orkut application, Bollywood Music, built by the Dhingana.com team. Bollywood Music was built to allow users to discover music with their friends, send musical scraps to their friends, dedicate songs or playlists to their friends, and create and share playlists. One of the most interesting social features — at least in my opinion — is that Bollywood Music mines musical data from all Orkut users and then displays the most popular songs and playlists on the application home page. You can try out the application at Orkut.com.

The team built the application using Amazon S3 to store their static data (images, JavaScript, and style sheets), as well as real-time data about songs. Moving forward they are hope to use Amazon EC2 and SimpleDB as well. (The initial version did not leverage the entire Amazon Web Services platform for other reasons.)

If you have an interesting website or application built using Amazon Web Services, we’d love to hear about it!

— Mike

Bits For Sale – The New Amazon S3 Requester Pays Model

We rolled out a powerful new feature for Amazon S3 in the final hours of 2008.

This new feature, dubbed Requester Pays, works at the level of an S3 bucket. If the bucket’s owner flags it as Requester Pays, then all data transfer and request costs are paid by the party accessing the data.

The Requester Pays model can be used in two ways.

First, by simply marking a bucket as Requester Pays, data owners can provide access to large data sets without incurring charges for data transfer or requests. For example, they could make available a 1 GB dataset at a cost of just 15 cents per month (18 cents if stored in the European instance of S3). Requesters use signed and specially flagged requests to identify themselves to AWS, paying for S3 GET requests and data transfer at the usual rates 17 cents per GB for data transfer (even less at high volumes) and 1 cent for every 10,000 GET requests. The newest version of the S3 Developer Guide contains the information needed to make use of S3 in this way.

 

Second, the Requester Pays feature can be used in conjunction with Amazon DevPay. Content owners charge a markup for access to the data. The price can include a monthly fee, a markup on the data transfer costs, and a markup on the cost of each GET. The newest version of the DevPay Developer Guide has all of the information needed to set this up, including some helpful diagrams. Organizations with large amounts of valuable data can now use DevPay to expose and monetize the data, with payment by the month or by access (or some combination). For example, I could create a database of all dog kennels in the United States, and make it available for $20 per month, with no charge for access. My AWS account would not be charged for the data transfer and request charges, only for the data storage.

I firmly believe that business model innovation is as important as technical innovation. This new feature gives you the ability to create the new, innovative, and very efficient business models that you will need to have in order to succeed in 2009!

— Jeff;

Architecting for the Cloud

Steve from MindTouch emailed me a while back about a really interesting write-up on how they moved their Wiki farm to Amazon EC2.Steve said that in the spirit of helping others do the same, we did a complete write up about it.

Mindtouch_architecture The article includes an architecture diagram, but more importantly it also drills into implementation details–complete with configuration settings that they used for HAProxy, Apache (with multi-tenant Deki), Memcache, and Lucene. MindTouch also implemented auto scaling, which is covered briefly.

If youre thinking about architecting an application for Amazon EC2, theres nothing like seeing someone else’s implementation, which you can read about here.

Oh, and one of my favorite features of MindTouchs wiki software is the Save to PDF feature. Made it easy to print out the paper.

Mike

Gas-Free S3-Powered Window Shopping : Amazon’s Windowshop.com

Imagine if I can have all pleasures of window shopping without stepping out of my warm and cozy home. Not having to worry about the cold winter or always-increasing-gas-prices or even wasting my precious energy.

Introducing Amazon’s Windowshop.com. You can use the power of your fingers (and arrow keys) to move and browse through best-selling Amazon.com products in different categories. Even see a cool preview of a movie or listen to a sample MP3 song by just hitting the space bar on the keyboard. You have to see it to believe it.

The tool shows all best-sellers and new releases in Books, Music, Video, Video Games categories. Also, there is a very interesting chronological aspect that drifts the older content slowly to the right as new fresh content is pushed every Tuesday!

Windowbig

Let me tell you why I am excited about windowshop.com. It’s because Amazon’s Windowshop is fully powered by Amazon S3. All the high-quality media, audio, images are served hot from Amazon S3 and at lightening speeds in real-time. Now that’s cool!

— Jinesh

Amazon S3 – Busier Than Ever

Amazon S3 usage has grown very nicely in the last quarter and now stands at 29 billion objects, up from 22 billion just a quarter ago. As one of the S3 engineers told me last week, that’s over 4 objects for every person now on Earth!

Our customers are keeping S3 pretty busy too. To give you an example of what this means in practice, the peak S3 usage for October 1st was over 70,000 storage, retrieval, and deletion requests per second.

All of this usage drives increasing economies of scale, or (in plain English) lower costs. I am happy to say that, effective November 1st, 2008, a new tiered pricing model for Amazon S3 storage will go in to effect. The new model features four price tiers, with prices decreasing based on the amount of storage used by each customer. Here is a full breakdown:

Tier US EU Description
0-50 TB $0.150/GB $0.180/GB First 50 TB per month of storage used
50-100 TB $0.140/GB $0.170/GB Next 50 TB per month of storage used
100-500 TB $0.130/GB $0.160/GB Next 400 TB per month of storage used
500+ TB $0.120/GB $0.150/GB Storage used per month over 500 TB

Customers large, small, and in-between have put S3 to all sorts of uses. Here are a few that you might find interesting:

Ng_topo National Geographic’s topo.com site stores seamless image maps for the entire United States in S3. You’ll need to register in order to see the maps.

Oracle_backup_wp Oracle Secure Backup now includes a Cloud Module which supports direct, multi-threaded backup to S3. Read the new white paper to learn more.

— Jeff;

PS – We’ll be updating the AWS Calculator when the new pricing model goes in to effect.

Amazon S3 Copy API Ready for Testing

Copying_s3_objects A few weeks ago we asked our developer community for feedback on a proposed Copy feature for Amazon S3. The feedback was both voluminous and helpful to us as we finalized our plans and designed our implementation.

This feature is now available for beta use; you can find full documentation here (be sure to follow the links to the detailed information on the use of this feature via SOAP and REST). Copy requests are billed at the same rate as PUT requests: $.01 for 1000 in the US, and $.012 for 1000 in Europe.

In addition to the obvious use for this feature — creating a new S3 object from an existing one — you can also use it to rename an object within a bucket or to move an object to a new bucket. You can also update the metadata for an object by copying it to itself while supplying new metadata.

Still on the drawing board is support for copying between US and Europe, and a possible conditional copy feature. Both of these items surfaced as a result of developer feedback.

Tool and library support for this new feature is already starting to appear; read more about that in this discussion board thread.

— Jeff;