Category: Amazon S3


New: Amazon S3 Reduced Redundancy Storage (RRS)

I’ve got a cool new Amazon S3 feature to tell you about, but I need to start with a definition!

Let’s define durability (with respect to an object stored in S3) as the probability that the object will remain intact and accessible after a period of one year. 100% durability would mean that there’s no possible way for the object to be lost, 90% durability would mean that there’s a 1-in-10 chance, and so forth.

We’ve always said that Amazon S3 provides a “highly durable” storage infrastructure and that objects are stored redundantly across multiple facilities within an S3 region. But we’ve never provided a metric, or explained what level of failure it can withstand without losing any data.

Let’s change that!

Using the definition that I stated above, the durability of an object stored in Amazon S3 is 99.999999999%. If you store 10,000 objects with us, on average we may lose one of them every 10 million years or so. This storage is designed in such a way that we can sustain the concurrent loss of data in two separate storage facilities.

If you are using S3 for permanent storage, I’m sure that you need and fully appreciate the need for this level of durability. It is comforting to know that you can simply store your data in S3 without having to worry about backups, scaling, device failures, fires, theft, meteor strikes, earthquakes, or toddlers.

But wait, there’s less!

Not every application actually needs this much durability. In some cases, the object stored in S3 is simply a cloud-based copy of an object that actually lives somewhere else. In other cases, the object can be regenerated or re-derived from other information. Our research has shown that a number of interesting applications simply don’t need eleven 9’s worth of durability.

To accommodate these applications we’re introducing a new concept to S3. Each S3 object now has an associated storage class. All of your existing objects have the STANDARD storage class, and are stored with eleven 9’s of durability. If you don’t need this level of durability, you can use the new REDUCED_REDUNDANCY storage class instead. You can set this on new objects when you store them in S3, or you can copy an object to itself while specifying a different storage class.

The new REDUCED_REDUNDANCY storage class activates a new feature known as Reduced Redundancy Storage, or RRS. Objects stored using RRS have a durability of 99.99%, or four 9’s. If you store 10,000 objects with us, on average we may lose one of them every year. RRS is designed to sustain the loss of data in a single facility.

RRS pricing starts at a base tier of $0.10 per Gigabyte per month, 33% cheaper than the more durable storage.

If Amazon S3 detects that an object has been lost any subsequent requests for that object will return the HTTP 405 (“Method Not Allowed”) status code. Your application can then handle this error in an appropriate fashion. If the object lives elsewhere you would fetch it, put it back into S3 (using the same key), and then retry the retrieval operation. If the object was designed to be derived from other information, you would do the processing (perhaps it is an image scaling or transcoding task), put the new image back into S3 (again, using the same key), and retry the retrieval operation.

Update (for HTTP protocol geeks only):

Id like to provide clarification regarding our choice of the HTTP 405 (Method Not Allowed) status code. Although 410 (Gone) may seem more appropriate, the HTTP 1.1 spec says that this condition is expected to be permanent and that clients “SHOULD delete references to the Request-URI”. In other words, the 410 status code indicates that the object has intentionally been removed and will not return. That is not necessarily true when data is lost. The object owner may wish to resolve the data loss by reuploading the object, in which case it would have been inappropriate for S3 to return a 410 status code. We believe that 405 is most appropriate because other methods (e.g. PUT, POST, and DELETE) remain valid for the object even if the objects data has gone missing. The objects name (its URI) remains valid, but the data for the object is gone. The 422 and 424 status codes are specific to WebDav and dont apply here.

We expect to see management tools and toolkits add support for RRS in the very near future.

You can use either storage class with Amazon CloudFront, of course.

I anticipate many unanticipated uses for this cool new feature; please feel free to leave me a comment with your ideas.

— Jeff;

PS – check out Amazon CTO Werner Vogels’ take on RRS. His post goes in to a bit more detail on how S3 was designed so that it will never lose data — “Core to the design of S3 is that we go to great lengths to never, ever lose a single bit. We use several techniques to ensure the durability of the data our customers trust us with…”

Save Money With Combined AWS Bandwidth Pricing

I never tire of telling our customers that they’ll be saving money by using AWS!

Effective April 1, 2010, we’ll add up the bandwidth you use for Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon SimpleDB, Amazon Relational Database Service (Amazon RDS), and the Amazon Simple Queue Service (SQS) on  a Region-by-Region basis and use that value to set your bandwidth tier for each Region. Depending on your bandwidth usage, this could mean your overall bandwidth charge will be reduced, since you will be able to reach higher volume tiers more quickly if you use multiple services.

We’re also going to make the first gigabyte of outbound data transfer each month free.

You’ll see both of these benefits in a new entry on your AWS Account Activity page. We’ll combine the bandwidth used by the services listed above.

Sound good?

— Jeff;

Amazon S3 Versioning Is Now Ready

Amazon S3‘s new Versioning feature has now graduated to production status! Once you have enabled versioning for a particular S3 bucket, you can create a new version of an object by simply uploading it. The old versions continue to exist and remain accessible.

Versioning’s MFA Delete feature has also graduated to production status. Once enabled for an S3 bucket, each version deletion request must include the six-digit code and serial number from your MFA (Multi-Factor Authentication) device.

Read the S3 documentation to learn more about these important new features.

Here’s a roundup of the tools and toolkits that already support S3 Versioning and MFA Delete:

I decided to have some fun with the new versioning feature!

I found some pictures from a few years ago, sorted them into chronological order, turned on versioning for one of my S3 buckets, and uploaded each of the pictures to the same S3 object, creating a series of versions.

The first of the pictures can be seen at right (wasn’t I cute?).

 

Here’s a complete list of the versions for this object. Each one is linked to a particular version of the picture:

I can always get to the latest version of my picture using the URL http://aws-blog.s3.amazonaws.com/jeff_barr.jpg. I can also use a versioned URL to access any version that I would like.

— Jeff;

AWS Import/Export – Support for Raw Drives and Bigger Devices

We’ve made two improvements to AWS Import/Export.

You can now send us a “raw” or internal SATA drive all by itself, with no need for an enclosure. You don’t have to send connectors, cables, or power cords. Raw SATA drives appear to be the most cost-effective way to send large amounts of data from place to place.

If you have a SATA cradle (I use this one at home; others have told me that they like this one), you can connect the drive to your desktop machine without having to open up the enclosure.

Also, you can now send us drives with capacities up to 4 TB. Customers with the need to import or export large amounts of data will reduce the number of devices needed.

Don’t forget that tools like Bucket Explorer, the CloudBerry S3 Explorer, and the S3Fox Explorer make it easy to create your Import and Export jobs.

— Jeff;

New Feature: Amazon S3 now supports Object Versioning

We’ve added beta support for Versioning across all Amazon S3 Regions.

Versioning provides an additional layer of protection for your S3 objects. You can easily recover from unintended user errors or application failures. You can also use Versioning for data retention and archiving. Once you have enabled Versioning for a particular S3 bucket, any operation that would have overwritten an S3 object (PUT, POST, COPY, and DELETE) retains the old version of the object. Here’s a simple diagram of Versioning in action:

Each version of the object is assigned a version id. For example, each version of Robot.png has its own version id:

The actual version ids are long strings; I’ve used v1, v2, and v3 to simplify the picture. You can retrieve the most recent version of an object by making a default GET request or you can retrieve any version (current or former) by making a version-aware request and including a version id. In effect, the complete key for an S3 object in a versioned bucket now consists of the bucket name, the object name, and the version id.

S3’s DELETE operation works in a new way when applied to a versioned object. Once an object has been deleted, subsequent default requests will no longer retrieve it. However, the previous version of the object will be preserved and can be retrieved by using the version id. Only the owner of an S3 bucket can permanently delete a version.

Normal S3 pricing applies to each version of an object. You can store any number of versions of the same object, so you may want to implement some expiration and deletion logic if you plan to make use of this feature.

Enabling Versioning’s MFA Delete setting on your bucket provides even more protection. Once enabled, you will need to supply two forms of authentication in order to permanently delete a version from your bucket: your AWS account credentials and the six-digit code and serial number from an MFA (Multi-Factor Authentication) device in your possession.

You can read more about Versioning in the S3 documentation.

— Jeff;

Cloud MapReduce from Accenture

Accenture is a Global Solution Provider for AWS. As part of their plan to help their clients extend their IT provisioning capabilities into the cloud, they offer a complete Cloud Computing Suite including the Accenture Cloud Computing Accelerator, the Cloud Computing Assessment Tool, the Cloud Computing Data Processing Solution, and the Accenture Web Scaler.

Huan Liu and Dan Orban of Accenture Technology Labs sent me some information about one of their projects, Cloud MapReduce. Cloud MapReduce implements Google’s MapReduce programming model using Amazon EC2, S3, SQS, and SimpleDB as a cloud operating system.

According to the research report on Cloud MapReduce, the resulting system runs at up to 60 times the speed of Hadoop (this depends on the application and the data, of course). There’s no master node, so there’s no single point of failure or a processing bottleneck. Because it takes advantage of high level constructs in the cloud for data (S3) and state (SimpleDB) storage, along with EC2 for processing and SQS for message queuing, the implementation is two orders of magnitude simpler than Hadoop. The research report includes details on the use of each service; they’ve also published some good info about the code architecture.

Download the code, read the tutorial, and and give it a shot!

–Jeff;

Third-Party AWS Tracking Sites

A couple of really cool third-party AWS tracking sites have sprung up lately. Some of these sites make use of AWS data directly and others measure it using their own proprietary methodologies. I don’t have any special insight in to the design or operation of these sites, but at first glance they appear to be reasonably accurate.

Cloud Exchange

Tim Lossen‘s Cloud Exchange site tracks the price of EC2 Spot Instances over time and displays the accumulated data in graphical form, broken down by EC2 Region, Instance Type, and Operating System. Here’s what it looks like:

Spot History

The Spot History site also tracks the price of EC2 Spot Instances over time. This one doesn’t break the prices down by Region. Here’s what it looks like:

Cloudelay

Marco Slot‘s Cloudelay site measures latency from your current location (e.g. your browser) to Amazon S3 and Amazon CloudFront using some clever scripting techniques.

Timetric

Timetric tracks the price of the EC2 Spot Instances and displays them in a number of ways including spot price as a percentage of the on-demand price and a bar chart. They also provide access to the for DIY charting.

— Jeff;

AWS Import/Export Goes Global

AWS Import/Export is a fast and reliable alternative to sending large volumes of data across the internet. You can send us a blank storage device and we’ll copy the contents of one or more Amazon S3 buckets to it before shipping it back to you. Or, you can send us a storage device full of data and we’ll copy it to the S3 buckets of your choice.

Until now, this service was limited to US shipping addresses and to S3’s US Standard Region. We’ve lifted both of those restrictions; developers the world over now have access to AWS Import/Export. Here’s what’s new:

  • Storage devices can now be shipped to an AWS address in the EU for use with S3’s EU (Ireland) Region.At this time, devices shipped to our AWS locations in the EU most originate from and be returned to an address within the European Union.
  • Storage devices can be shipped from almost anywhere in the world to a specified AWS address in the US for data loads into and out of buckets in the US Standard Region. Previously, devices could only be shipped from and returned to addresses in the United States.

Customers will be responsible for paying duty and taxes for any shipment which crosses into or out of the United States and must include an AWS Import/Export Declaration Form as part of the job creation request.

Based in New Zealand, ZetaPrints is a dynamic image generator with a built-in web-to-print feature.They migrated to EC2 and S3 when the popularity of their OpenX and Magento soared to new heights very quickly. The ZetaPrints developers used the Import/Export feature to transfer a terabyte of data in to S3 when they migrated their system over to EC2 and S3. They told me that:

Moving about 1TB of data to AWS was easier than we thought. All it took was a $100 USB drive shipped to AWS Import/Export and 100 lines of code to copy files from S3 to an EBS volume. We went live only a few days later. It couldn’t be easier.

By the way, adding to the fun, Zetaprints also has a web service of its own, with a very powerful set of image generation and account management functions.

Many of our EU customers face regulatory concerns around the location of their data. Amazon S3 helps to alleviate these concerns by allowing objects to be stored exclusively in our EU (Ireland) Region. The EU (Ireland) Region also offers a high level of consistency for requests. The Region provides read-after-write consistency for PUTS of new objects in your Amazon S3 bucket and eventual consistency for overwrite PUTS and DELETES. This ensures that you are able to immediately retrieve any new objects that are PUT in your Amazon S3 bucket.

Putting it all together, the new Import/Export flexibility, the stronger consistency model, and the recent price reduction should make Amazon S3 an even more useful and cost-effective global storage solution.

Jeff;

The New AWS Simple Monthly Calculator

Our customers needed better ways to model their applications and estimate their costs. The flexible nature of on-demand scalable computing allows you pick and choose the services you like and only pay for those. Hence to give our customers an opportunity to estimate their costs, we have redesigned our current AWS Simple Monthly Calculator

The link to the new calculator is http://aws.amazon.com/calculator 

AWS Simple Monthly Calculator


This new calculator incorporates a wide array of pricing calculations across all our services in all our regions. It also shows breakdown of features for each service in each region. Most of our customers use multiple AWS services in multiple regions. Hence the calculator has a mechanism to select a service in a particular region and add it to the bill.

Users can select a region and simply input the usage values of each service in that region and then click on the “Add to Bill” button. This will add the service to the bill. Once the service is added to the bill, the user can modify/update the input fields, and the calculator will automatically recalculate. Users can remove the service from the bill by simply clicking on the red cross button Delete in the bill. Each input field represents to be a value per month. For example, Amazon S3 Data Transfer out is 40 GB in a Month and 5 Extra Large Linux Instances running for 10 hours in a month. For EC2/RDS/EMR, Users can click on green plus button Add and add similar type of instances (for eg., Web Servers, App Servers, MySQL DB Servers) in their topology.

The calculator also shows common customer samples and their usage. Users can click on “Disaster Recovery and Backup” sample or “Web Application” sample and see usages of each service. We will continue to add new real-world customer samples in future. If you would like us to add your sample usage, please let us know.

Last month, we announced new EC2 pricing, new Instance Types and a new AWS service (Amazon RDS). This calculator incorporates all of these new features and services and will continue to evolve as we add new features, new services and new regions .

The calculator is currently in beta and provides an estimate of the monthly bill. The goal is to help our customers and prospects estimate their monthly bill more efficiently. In the future, we would also like to give you recommendations on how you can save more by switching to Reserved Instances from on-demand instances or using AWS Import/Export service instead of standard S3 data transfer. Stay Tuned.

We would love to hear your feedback. Please let us know whether this new version of the calculator is helpful in estimating your AWS costs. You can send your feature requests, comments, suggestions, appreciations, confusions, frustrations to evangelists at amazon dot com. 

— Jinesh

AWS Workshops in Beijing, Bangalore and Chennai

I will be in China and India starting next week. Apart from other meetings and presentations to user group, this time, I will be taking up 3-hour workshops. These workshops are targeted at architects and technical decision makers and attendees will get a chance to play with core AWS infrastructure services.

If you are a System Integrator or Independent Software Vendor, Enterprise Architect or a entrepreneur, this will be a great opportunity to meet and learn more about AWS.

Seats are limited and prior registration is required:

AWS Workshop in Beijing
Oct 24th
: http://sd2china.csdn.net
(in conjunction with CSDN conference)

AWS Workshop in Bangalore
Nov 4th : http://www.btmarch.com/btsummit/edm/Amazon.html
(in conjunction with Business Technology Summit)

AWS Workshop in Chennai
Nov 10th : http://www.awschennai.in/
(in conjunction with AWS Chennai Enthusiasts group)

Details of the Workshop

AMAZON WEB SERVICES DEEP DIVE – CLOUD WORKSHOP

INTRO TO AWS INFRASTRUCTURE SERVICES 30 MIN
Learn how to create an AWS account, understand SOAP, REST and Query APIs and learn how to use the tools like AWS management Console

DEEP DIVE INTO AWS INFRASTRUCTURE SERVICES – 60 MIN
Amazon EC2 – Learn how to create, bundle and launch and AMI, Setting up Amazon EBS volumes, Elastic IP
Amazon S3 buckets objects and ACLs and Amazon CloudFront distributions
Amazon SQS queues
Amazon SimpleDB Domains, Items, Attributes and Querying
Amazon Elastic MapReduce JobFlows and Map Reduce

Exercise/Assignments:
Architect a Web application in the Cloud to be discussed in class

ARCHITECTING FOR THE CLOUD: BEST PRACTICES 30 MIN
Learn how to build highly scalable applications in the cloud. In this session, you will learn about best practices, tip, tricks and techniques of leveraging the highly scalable infrastructure platform: AWS cloud.
 
MIGRATING APPLICATIONS TO THE CLOUD – 30 MIN
Learn a step by step approach to migrate your existing applications to the Cloud environment. This blueprint will help enterprise architects in performing a cloud assessment, selecting the right candidate for the Cloud for a proof of concept project and leveraging the actual benefits of the Cloud like auto-scaling and low-cost business continuity. Jinesh will discuss migration plans and reference architectures of various examples, scenarios and use cases.

Laptop required
Knowledge of Java language preferred

See you at the workshop!

– Jinesh