Category: Amazon S3


Client-Side Data Encryption for Amazon S3 Using the AWS SDK for Java

The newest version of the AWS SDK for Java has a very convenient client-side encryption feature. Once enabled, the SDK will automatically encrypt data before sending it to Amazon S3, and decrypt it before returning it to your application. You have full control of the keys used to encrypt and decrypt your data and the keys are never transmitted over the wire.

This feature is implemented using a technique known as envelope encryption. Here’s a diagram that should help to illustrate the concept:

Your calls to the AWS SDK for Java include a Master Key. This key is used to encrypt an envelope key that is generated within the SDK.  The envelope key is used to encrypt the master key and the data before it leaves the client. The encrypted envelope key is stored alongside the encrypted data along with a description of the master key.

On retrieval, the encrypted envelope key is compared to the description of the master key. If they do not match, the client application is asked to supply the original master key. You can use this feature to integrate your application with an existing private key management system.

This functionality is implemented within the AmazonS3EncryptionClient class. This class is a subclass of the original AmazonS3Client with additional parameters and methods to control the encryption and decryption process.

The SDK provides a number of configurable options, including the ability to use either asymmetric or symmetric encryption. You can also choose to store the encrypted envelope key as S3 object metadata or as a separate S3 object.

This feature is available now and you can start using it today. We’ve created a new article, AWS SDK for Java and Amazon S3 Encryption, with complete information, including code.

— Jeff;

Now Open: AWS Region in Tokyo

I have made many visits to Japan over the last several years to speak at conferences and to meet with developers. I really enjoy the people, the strong sense of community, and the cuisine.

Over the years I have learned that there’s really no substitute for sitting down, face to face, with customers and potential customers. You can learn things in a single meeting that might not be obvious after a dozen emails. You can also get a sense for the environment in which they (and their users or customers) have to operate. For example, developers in Japan have told me that latency and in-country data storage are of great importance to them.

Long story short, we’ve just opened up an AWS Region in Japan, Tokyo to be precise. The new region supports Amazon EC2 (including Elastic IP Addresses, Amazon CloudWatch, Elastic Block Storage, Elastic Load Balancing, VM Import, and Auto Scaling), Amazon S3, Amazon SimpleDB, the Amazon Relational Database Service, the Amazon Simple Queue Service, the Amazon Simple Notification Service, Amazon Route 53, and Amazon CloudFront. All of the usual EC2 instance types are available with the exception of the Cluster Compute and Cluster GPU. The page for each service includes full pricing information for the Region.

Although I can’t share the exact location of the Region with you, I can tell you that private beta testers have been putting it to the test and have reported single digit latency (e.g. 1-10 ms) from locations in and around Tokyo. They were very pleased with the observed latency and performance.

Existing toolkit and tools can make use of the new Tokyo Region with a simple change of endpoints. The documentation for each service lists all of the endpoints for each service.

This offering goes beyond the services themselves. We also have the following resources available:

Put it all together and developers in Japan can now build applications that respond very quickly and that store data within the country.

 

The JAWS-UG (Japan AWS User Group) is another important resource. The group is headquartered in Tokyo, with regional branches in Osaka and other cities. I have spoken at JAWS meetings in Tokyo and Osaka and they are always a lot of fun. I start the meeting with an AWS update. The rest of the meeting is devoted to short “lightning” talks related to AWS or to a product built with AWS. For example, the developer of the Cacoo drawing application spoke at the initial JAWS event in Osaka in late February. Cacoo runs on AWS and features real-time collaborative drawing.

We’ve been working with some of our customers to bring their apps to the new Region ahead of the official launch. Here is a sampling:

Zynga is now running a number of their applications here. In fact (I promise I am not making this up) I saw a middle-aged man playing Farmville on his Android phone on the subway when I was in Japan last month. He was moving sheep and fences around with rapid-fire precision!

 

The enStratus cloud management and governance tools support the new region.

enStratus supports role-based access, management of encryption keys, intrusion detection and alerting, authentication, audit logging, and reporting.

All of the enStratus AMIs are available. The tools feature a fully localized user interface (Cloud Manager, Cluster Manager, User Manager, and Report) that can display text in English, Japanese, Korean, Traditional Chinese, and French.

enStratus also provides local currency support and can display estimated operational costs in JPY (Japan / Yen) and a number of other currencies.

 

Sekai Camera is a very cool augmented reality application for iPhones and Android devices. It uses the built-in camera on each device to display a tagged, augmented version of what the camera is looking at. Users can leave “air tags” at any geographical location. The application is built on AWS and makes use of a number of services including EC2, S3, SimpleDB, SQS, and Elastic Load Balancing. Moving the application to the Tokyo Region will make it even more responsive and interactive.

 

G-Mode Games is running a multi-user version of Tetris in the new Region. The game is available for the iPhone and the iPod and allows you to play against another person.

 

Cloudworks is a management tool for AWS built in Japan, and with a Japanese language user interface. It includes a daily usage report, scheduled jobs, and a history of all user actions. It also supports AWS Identity and Access Management (IAM) and copying of AMIs from region to region.

 

Browser 3Gokushi is a well-established RPG (Role-Playing Game) that is now running in the new region.

 

Here’s some additional support that came in after the original post:

Here are some of the jobs that we have open in Japan:

— Jeff;

Note: Tetris and 1985~2011 Tetris Holding. Tetris logos, Tetris theme song and Tetriminos are trademarks of Tetris Holding. The Tetris trade dress is owned by Tetris Holding. Licensed to The Tetris Company. Game Design by Alexey Pajitony. Original Logo Design by Roger Dean. All Rights Reserved. Sub-licensed to Electronic Arts Inc. and G-mode, Inc.

Upcoming Event: AWS Tech Summit, London

I’m very pleased to invite you all to join the AWS team in London, for our first Tech Summit of 2011. We’ll take a quick, high level tour of the Amazon Web Services cloud platform before diving into the technical detail of how to build highly available, fault tolerant systems, host databases and deploy Java applications with Elastic Beanstalk.

We’re also delighted to be joined by three expert customers who will be discussing their own, real world use of our services:

So if you’re a developer, architect, sysadmin or DBA, we look forward to welcoming you to the Congress Centre in London on the 17th of March.

We had some great feedback from our last summit in November, and this event looks set to be our best yet.

The event is free, but you’ll need to register.

~ Matt

Host Your Static Website on Amazon S3

We’ve added some new features to Amazon S3 to make it even better at hosting static websites.

Customers have been hosting their images and video for their websites on Amazon S3 for a long time. However, it was not that easy to host your entire website on S3. Why? If a user enters a site address (www.example.com) and the CNAME in the site’s DNS record resolves to the root of an S3 bucket (www.example.com.s3.amazonaws.com), Amazon S3 would list the contents of the bucket in XML form. In order to work around this, customers would host their home page on an Amazon EC2 instance. This is no longer necessary.

You can now host an entire website on Amazon S3.

You can now configure and access any of your S3 buckets as a “website.” When a request is made to the root of your bucket configured as a website, Amazon S3 returns a root document. Not only that, if an error occurs your users receive an HTML error document instead of an XML error message. You can also provide your own error documents for use when a 4xx-class error occurs. 

Here’s more detail on the new features…

Website Endpoints
To access this website functionality, Amazon S3 exposes a new website endpoint for each region (US Standard, US West, EU, or Asia Pacific). For example, s3-website-ap-southeast-1.amazonaws.com is the endpoint for the Asia Pacific Region. Existing buckets and endpoints continue to work the same way they always have.

Root and Index Documents
When you configure your bucket as a website, you can specify the index document you want returned for requests made to the root of your website or for any subdirectory.  For example, a GET request made to the following URI (either direct or via a CNAME):

mywebsitedomain.s3-website-us-east-1.amazonaws.com/images/subdirectory/

Will return the following S3 object

mywebsitedomain.s3.amazonaws.com/images/subdirectory/index.html

Error Document
When you access a website-configured bucket through the new website endpoint, and an error occurs, Amazon S3 now returns a new HTML error page instead of the current XML error. Also, you can now specify your own custom error page when a 4XX error occurs.

You can use the S3 tab of the AWS Management Console to enable your bucket as a website.

The CloudBerry S3 Explorer includes support for this cool new feature:

The newest version of Bucket Explorer also supports website hosting:

And (added post-release) the S3 Browser also supports it:

Also added post-release, CloudBuddy Personal has added support, as described in this blog post:

The AWS Java, .NET, and PHP SDKs support this new feature; for more information, consult the Amazon S3 Developer Guide. As always, we also encourage developers of libraries and tools to add support for this as well. If you are such a developer, leave me a comment or send me some email (awseditor@amazon.com) once you are ready to go.

I’m pretty excited by this new feature and hope that you are too. I think that it will be pretty cool to see website owners simply and inexpensively gain world class performance by hosting their entire website on Amazon S3. In fact, Amazon CTO Werner Vogels is already doing this! Check out his post, New AWS feature: Run your website completely from Amazon S3, for more information.

— Jeff;

 

Amazon S3 – Bigger and Busier Than Ever

The number of objects stored in Amazon S3 continues to grow:

Here are the stats, measured at the end of the fourth quarter of each year:

  • 2006 – 2.9 billion objects
  • 2007 – 14 billion objects
  • 2008 – 40 billion objects
  • 2009 – 102 billion objects
  • 2010 – 262 billion objects

The peak request rate for S3 is now in excess of 200,000 requests per second.

If you want to work on game-changing, world-scale services like this, you should think about applying for one of the open positions on the S3 team:

— Jeff;

 

Note: The original graph included an extraneous (and somewhat confusing) data point for Q3 of 2010. I have updated the graph for clarity.

New Webinar: High Availability Websites

As part of a new, monthly hands on series of webinars, I’ll be giving a technical review of building, managing and maintaining high availability websites and web applications using Amazons cloud computing platform.

Hosting websites and web applications is a very common use of our services, and in this webinar we’ll take a hands-on approach to websites of all sizes, from personal blogs and static sites to complex multi-tier web apps.

Join us on January 28 at 10:00 AM (GMT) for this 60 minute, technical web-based seminar, where we’ll aim to cover:

  • Hosting a static website on S3
  • Building highly available, fault tolerant websites on EC2
  • Adding multiple tiers for caching, reverse proxies and load balancing
  • Autoscaling and monitoring your website

Using real world case studies and tried and tested examples, well explore key concepts and best practices for working with websites and on-demand infrastructure.

The session is free, but you’ll need to register!

See you there.

~ Matt

 

AWS Import/Export Now in Singapore

You can now use the AWS Import/Export service to import and export data into and out of Amazon S3 buckets in the Asia Pacific (Singapore) Region via portable storage devices with eSATA USB 2.0, and SATA interfaces.

You can ship us a device loaded with data and we’ll copy it to the S3 bucket of your choice. Or you send us an empty device and we’ll copy the contents of one or more buckets to it. Either way, we’ll return the device to you. We work with devices that store up to 8 TB on a very routine basis and can work with larger devices by special arrangement. We can handle data stored on NTFS, ext2, ext3, and FAT32 file systems.

Our customers in the US and Europe have used this service to take on large-scale data migration, content distribution, backup, and disaster recovery challenges.

For example, Malaysia-based AMP Radio Networks runs a web platform for 9 FM radio stations. They host this platform and the associated audio streaming on EC2 and make use of CloudWatch, Auto Scaling, and CloudFront. AWS Import/Export allows AMP Radio Networks to transfer huge amounts of data directly from their facilities to AWS. They save time and money and can bring new content more quickly than they could if they had to upload it to the cloud in the traditional way.

— Jeff;

Amazon S3 – Object Size Limit Now 5 TB

A number of our customers want to store very large files in Amazon S3 — scientific or medical data, high resolution video content, backup files, and so forth. Until now, they have had to store and reference the files as separate chunks of 5 gigabytes (GB) or less. So, when a customer wanted to access a large file or share it with others, they would either have to use several URIs in Amazon S3 or stitch the file back together using an intermediate server or within an application.

No more.

We’ve raised the limit by three orders of magnitude. Individual Amazon S3 objects can now range in size from 1 byte all the way to 5 terabytes (TB). Now customers can store extremely large files as single objects, which greatly simplifies their storage experience. Amazon S3 does the bookkeeping behind the scenes for our customers, so you can now GET that large object just like you would any other Amazon S3 object.

In order to store larger objects you would use the new Multipart Upload API that I blogged about last month to upload the object in parts. This opens up some really interesting use cases. For example, you could stream terabytes of data off of a genomic sequencer as it is being created, store the final data set as a single object and then analyze any subset of the data in EC2 using a ranged GET. You could also use a cluster of EC2 Cluster GPU instances to render a number of frames of a movie in parallel, accumulating the frames in a single S3 object even though each one is of variable (and unknown at the start of rendering) size.

The limit has already been raised, so the race is on to upload the first 5 terabyte object!

— Jeff;

Updates to the AWS SDKs

We’ve made some important updates to the AWS SDK for Java the AWS SDK for PHP, and the AWS SDK for .NET. The newest versions of the respective SDKs are available now.

AWS SDK for Java

The AWS SDK for Java now supports the new Amazon S3 Multipart Upload feature in two different ways. First, you can use the new APIs — InitiateMultipartUpload, UploadPart, CompleteMultipartUpload, and so forth. Second, you can use the SDK’s new TransferManager class. This class implements an asynchronous, higher level interface for uploading data to Amazon S3. The TransferManager will use multipart uploads if the object to be uploaded is larger than a configurable threshold. You can simply initiate the transfer (using the upload method) and proceed. Your application can poll the TransferManager to track the status of the upload.

The SDK’s PutObject method can now provide status updates via a new ProgressListener interface. This can be used to implement a status bar or for other tracking purposes.

We’ve also fixed a couple of bugs.

AWS SDK for PHP

The AWS SDK for PHP now supports even more services. We’ve added support for Elastic Load Balancing, the Relational Database Service, and the Virtual Private Cloud.

We have also added support for the S3 Multipart Upload, and for CloudFront Custom Origins, and you can now stream to (writing) or from (reading) an open file when transferring an S3 object. You can also seek to a specific file position before initating a streaming transfer.

The 1000-item limit has been removed from the convenience functions; get_bucket_filesize, get_object_list, delete_all_objects, delete_all_object_versions, and delete_bucket will now operate on all of the entries in a bucket.

We’ve also fixed a number of bugs.

AWS SDK for .NET

The AWS SDK for .NET now supports the Amazon S3 Multipart Upload feature using the new APIs — InitiateMultipartUpload, UploadPart, CompleteMultipartUpload, etc.as well as a new TransferUtility class that automatically determines when to upload objects using the Multipart Upload feature.

Weve also added support for the CloudFront Custom Origins and fixed a few bugs.

These SDKs (and a lot of other things) are produced by the AWS Developer Resource team. They are hiring and have the following open positions:

— Jeff;

 

Amazon S3: Multipart Upload

Can I ask you some questions?

  • Have you ever been forced to repeatedly try to upload a file across an unreliable network connection? In most cases there’s no easy way to pick up from where you left off and you need to restart the upload from the beginning.
  • Are you frustrated because your company has a great connection that you can’t manage to fully exploit when moving a single large file? Limitations of the TCP/IP protocol make it very difficult for a single application to saturate a network connection.

In order to make it faster and easier to upload larger (> 100 MB) objects, we’ve just introduced a new multipart upload feature.

You can now break your larger objects into chunks and upload a number of chunks in parallel. If the upload of a chunk fails, you can simply restart it. You’ll be able to improve your overall upload speed by taking advantage of parallelism. In situations where your application is receiving (or generating) a stream of data of indeterminate length, you can initiate the upload before you have all of the data.

Using this new feature, you can break a 5 GB upload (the current limit on the size of an S3 object) into as many as 1024 separate parts and upload each one independently, as long as each part has a size of 5 megabytes (MB) or more. If an upload of a part fails it can be restarted without affecting any of the other parts. Once you have uploaded all of the parts you ask S3 to assemble the full object with another call to S3.

Here’s what your application needs to do:

  1. Separate the source object into multiple parts. This might be a logical separation where you simply decide how many parts to use and how big they’ll be, or an actual physical separation accomplished using the Linux split command or similar (e.g. the hk-split command for Windows).
  2. Initiate the multipart upload and receive an upload id in return. This request to S3 must include all of the request headers that would usually accompany an S3 PUT operation (Content-Type, Cache-Control, and so forth).
  3. Upload each part (a contiguous portion of an object’s data) accompanied by the upload id and a part number (1-10,000 inclusive). The part numbers need not be contiguous but the order of the parts determines the position of the part within the object. S3 will return an ETag in response to each upload.
  4. Finalize the upload by providing the upload id and the part number / ETag pairs for each part of the object.

You can implement the third step in several different ways. You could iterate over the parts and upload one at a time (this would be great for situations where your internet connection is intermittent or unreliable). Or, you can upload many parts in parallel (great when you have plenty of bandwidth, perhaps with higher than average latency to the S3 endpoint of your choice). If you choose to go the parallel route, you can use the list parts operation to track the status of your upload.

Over time we expect much of the chunking, multi-threading, and restarting logic to be embedded into tools and libraries. If you are a tool or library developer and have done this, please feel free to post a comment or to send me some email.

Update: Bucket Explorer now supports S3 Multipart Upload!

Update 2: So does CloudBerry S3 Explorer.

Update 3: And now S3 Browser!

Update 4 (2017): Removed link to the now-defunct Bucket Explorer.

— Jeff;