By Craig Noeldner and Mike Culver, Amazon Web Services
Scenario: Imagine you have a small web site with big potential. You’re currently using a reasonably-priced web hosting provider that provides a good value for the amount of traffic you normally receive. Perhaps you’ve gone one step further and are hosting your site on a dedicated server. However, your site has caught the attention of the blogosphere and you’re about to get much more traffic than you can handle in your current web hosting setup.
What are you going to do?
Knowing how to scale your web site can mean the difference between watching your idea take off or take a dive. A common technique for scaling a web site is to use a different server to host media files like images, videos, and audio files. This distributes the traffic and bandwidth load between hosts and allows the primary web server to focus on delivering web pages and server-side processing, rather than serving up 5MB audio files (or even 100MB videos).
If you don’t want to set up, configure, and maintain a few extra servers just for hosting your media files, then use Amazon S3. Amazon S3 is storage for the Internet and gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.
This tutorial walks through the steps necessary for hosting media files for your web site using Amazon S3. We’ll use a domain we’ve already registered, webscalecomputing.info, to set up a new sub-domain, media.webscalecomputing.info, that will host the images, videos, and audio files in Amazon S3.
While we won’t go into any programming details for using Amazon S3, you’ll need to have a basic understanding of web networking and DNS to read this article. (Or, you’ll need enough background to translate the concepts to your own hosting provider.)
More on Amazon S3
Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. Generally, software developers use Amazon S3 in their applications that need the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.
You can always improve your web site performance by moving your media files from your main web server. This could be as simple as creating a sub-domain that points to a host that serves your media files. Of course, you still have to worry about the typical heavy-lifting for any type of hosting, such as:
- How much traffic will this setup accommodate? What happens if I get more traffic than it can handle?
- What happens if the host goes down?
- How do I backup the files so they’re not lost?
- How much am I paying for idle capacity?
Amazon S3 provides answers to those questions, without the need for worrying about the pesky details of, well, implementing them.
The web services interface is simple enough that you can retrieve data using a URL, so it’s well-suited for basic web hosting tasks, like serving up media files.
The pricing for Amazon S3 is on a pay-as-you-go basis, so there is no minimum fee. This means you don’t have to invest in a large amount of hosting infrastructure or services in order to ensure that your web site handles the occasional traffic spike.
Use the AWS Simple Monthly Calendar provided by AWS to estimate your monthly bill.
Amazon S3 in Action
Blue Origin is one small company with a big idea that successfully scaled its web site using Amazon S3. On January 2, 2007, the company posted information and videos on its web site about a test launch for a new vertical take-off, vertical-landing vehicle. Within the next day, the news was covered by both SlashDot and Boing Boing, sending a tremendous amount of traffic to its web site. With its media files stored in Amazon S3, it was able to instantly scale and handle the 3.5 million requests and 758 GBs in bandwidth in a single day.
Had the company hosted the web site completely on one of its internal servers, the traffic on January 04 would have overwhelmed their system capacity. If they had used a basic hosting package from a popular provider, they would have overwhelmed that service, or—even worse—exceeded the maximum allowed bandwidth for the month and occurred massive overage fees.
Blue Origin’s total charge for Amazon S3 in January? Just over $300.
SmugMug, www.smugmug.com, is another company that’s using Amazon S3 for hosting its media files. After 12 months, they’ve saved almost $1M.
Now, let’s go through the steps of hosting your media files on Amazon S3, like Blue Origin.
Signing up for Amazon S3
If you haven’t already, sign up for Amazon S3 at http://aws.amazon.com/s3. After signing up for Amazon S3, you’ll have two access identifiers needed for uploading your media files:
- Access Key ID
- Secret Access Key
The Access Key ID is a public identifier, like a user name, that specifies a particular Amazon S3 account. The Secret Access Key is the private identifier, like a password, that ensures you’re the one making a request.
Important: Your Secret Access Key is a secret, and should be known only by you and AWS. You should never e-mail your Secret Access Key to anyone. It is important to keep your Secret Access Key confidential to protect your account.
Uploading Your Media Files
Without going into too many details, Amazon S3 uses concepts of a bucket and object to store data. Buckets help organize a collection of objects, like how a folder might contain a list of files.
There are many tools available for working with Amazon S3 without having to write a software application. For this tutorial, we’ll use a plug-in for the Firefox browser, called S3Fox (https://addons.mozilla.org/en-US/firefox/addon/3247). You can also use one of the many code samples and tools available through the Amazon S3 Resource Center (http://aws.amazon.com/resources) or use a product built on Amazon S3 in the Solutions Catalog (http://solutions.amazonwebservices.com).
First, create a bucket in your Amazon S3 account that corresponds to the domain you’ll use to host your media files. For our web site, we’ll create a bucket called, “media.webscalecomputing.info”.
Important: Use lower-case letters only to name buckets that will be used in DNS redirects. This requirement is a function of the way that DNS handles names (always lower case).
Why use this specific bucket name? Amazon S3 has a virtual hosting feature that allows inbound requests from a web site, so it will serve up content from the bucket by the same name. We’ll talk more about this feature in the next section when we configure our domain.
Next, add your media files to the new bucket in Amazon S3. Using the Firefox plug-in, it’s as simple as selecting the files on your local system, then clicking the transfer button.
Amazon S3 has a rich set of access privileges for both buckets and objects, so make sure that permissions are set on both the bucket and your objects to allow everyone access. The Firefox plug-in we’re using sets this for us using a dialog box.
All the media files are now accessible through a URL that points to Amazon S3. The basic URL syntax for S3 is http://<bucket_name>.s3.amazonaws.com/<object_name>, so the files we uploaded have the following URLs:
The simplest way to use Amazon S3 for media hosting is to simply update our web pages to point to these files. For example:
However, when people download our files, we want them to look like they’re coming from our domain, and not s3.amazonaws.com. If someone chooses to download our audio file, we want users to think it’s coming from our site. We’ll now set up our domain hosting so that the files are available through a URL under http://media.webscalecomputing.info/.
Setting up Your Domain
Since we already host our web site on www.webscalecomputing.info, we now want to create a sub-domain that we’ll point to the files located in Amazon S3. This is done by using a CNAME entry on our hosting provider.
Most popular web hosting companies will let you create a new CNAME record for your domain. For our hosting company, creating a new CNAME record consisted of logging into our account, then navigating through a few DNS configuration pages until we ended up at one that allows us to create a CNAME record.
To create the CNAME record, we specify an alias, “media”, and the domain it points to, “media.webscalecomputing.info.s3.amazonaws.com”.
Now, with the CNAME record in place, the media files are now available through the following URLs:
Our web page can now reference the media files.
Automatically Copying Files to Amazon S3
There are also more ways you can use Amazon S3, including automatically copying files to Amazon S3. The Resource Center in the AWS Developer Connection web site has technical documentation, code samples, and other resources you can use to learn more about Amazon S3 and build your own applications to use the service. As always, the exact tutorial to read depends on the language you’re using, but here are a few possibilities.
Learning More About Amazon S3
Why not host the entire web site in Amazon S3 and just use a domain provider to set up the appropriate CNAME records? Although it’s certainly possible, you may want to have a web server running to perform server-side processing on a script or to access a database. Amazon S3 is a storage solution, so it does not perform any server-side processing (but check out Amazon EC2 for information on scalable, virtual computing).
Of course, you don’t have to have a web site to use Amazon S3. Like Jeremy Zawodny, we use Amazon S3 to backup our home computers. (Craig pays just over $1 a month to backup his important files.)
Here are a few links for learning more about Amazon S3:
- Amazon S3 web page – http://aws.amazon.com/s3
- Developer Connection web site – http://developer.amazonwebservices.com
- Resource Center for Amazon S3 – http://developer.amazonwebservices.com/s3/resources
- Developer Forums - http://developer.amazonwebservices.com/s3/forums
- Solutions Built on Amazon S3 - http://solutions.amazonwebservices.com/connect/kbcategory.jspa?categoryID=66