Processing Images with Amazon Web Services

Articles & Tutorials>Amazon SQS>Processing Images with Amazon Web Services
John Fronckowiak and Tom Myer team up to provide the steps for creating a simple thumbnail service built on Amazon Web Services.

Details

Submitted By: John Fronckowiak
AWS Products Used: Amazon EC2, Amazon S3, Amazon SQS, Amazon SimpleDB
Language(s): PHP
Created On: June 26, 2008 8:56 PM GMT
Last Updated: September 21, 2008 9:58 PM GMT
By John Fronckowiak (john@idcc.net), President IDC Consulting, Inc. and Clinical Assistant Professor, Medaille College; and Tom Myer (tom@tripledogs.com) founder and owner of Triple Dog Dare Media.

Automating Images

Processing images can be a tedious, error-prone, and repetitive task. It may involve many moving parts, and bandwidth or processor time that you don’t have (or can’t easily afford). Setting up a system that allows you to process images provided by web users could bog you down if you don’t have enough disk space or enough CPU to handle the demand.

We decided to create a straightforward solution that would allow users to upload an image and then process that image with Amazon Web Services (AWS). For simplicity’s sake, the only processing step taken during the procedure is to thumbnail an image; however, for all practical purposes, we could have set up any number of processing steps.

The solution we created involves these pieces:

  • An Amazon Elastic Cloud Computing (Beta) (EC2) instance running Apache and PHP (including PEAR and GD to support our processing needs)
  • An Amazon Simple Storage Service (S3) account to hold uploaded images
  • An Amazon SimpleDB™ account, to hold metadata about those images
  • An Amazon Simple Query Service (SQS) account, to send and receive messages that involve those images

What Are Amazon Web Services?

Unless you’ve been living in a cave for the past decade, you know that Amazon.com is one of the world’s leading Internet retail brands. Amazon started out selling books on the Web, then moved on to sell music, movies, power tools, and more.

Amazon isn’t just about the front end; some exciting things are going on in terms of backend power. The most significant is access to AWS. These services are easy to use and relatively inexpensive, providing developers and business owners with innovative fixes for common headaches (such as backup solutions).

Why use Amazon S3? You could spend money on servers or external hard disks; spend more money to connect those machines to the network (or paying colocation fees); and then spend even more money to secure, maintain, and update those servers. Or you can pay a fraction of that money for a high-bandwidth, high-reliability service provided by Amazon.

Amazon S3 provides developers with simple, Internet-based storage that can be used with Amazon EC2 servers. Customized Amazon EC2 servers, called machine instances, are stored in Amazon S3. Developers who want to use Amazon EC2 must also have an Amazon S3 account. Like Amazon EC2, Amazon S3 is fee-based, with price dependent on the amount of storage used and data transferred in and out.

Amazon SimpleDB provides light database functions and allows developers to store metadata in simple, flat schemas that are much like spreadsheets, with rows and columns. In this article, we’ll show you how we set up a very simple set of metadata for each uploaded image so that you can keep track of each job.

Amazon SQS allows developers to send and receive messages between different Amazon web services and applications. For example, one application could send a status message that is processed by another application, which could send its own messages for other applications. In this article, we’ll show you how to use messages to cue up jobs on Amazon EC2.

How to Sign Up

Before you can use AWS, you’ll need to sign up. The process takes about 5 to 10 minutes maximum and requires that you fill out a simple form and register a credit card to pay for services. You’ll need to activate each web service that you want to work with.

Simply go to http://aws.amazon.com to sign up. To use this code, you need to have an Access Key ID and a Secret Access Key from Amazon—you’ll get those through email as soon as you complete your registration.

Application Overview

The image-processing application operates in two states: first receiving and then storing processing requests. The application makes use of Amazon EC2, S3, SimpleDB, and SQS:

Amazon EC2 contains the actual PHP web application and interacts directly with the other services. When an image is uploaded for processing, it is storedtemporarily in the Amazon S3 service. Information about the processing request is stored in the Amazon SimpleDB database, and a processing request is queued in Amazon SQS.

The application periodically awakens to check the Amazon SQS queue. Any pending requests are retrieved from the Amazon SimpleDB database:

The corresponding image is retrieved from Amazon S3 storage, and the image is processed accordingly. The processed image is then emailed back to the requester.

Creating the Application

As stated earlier, the application is built on top of an Amazon EC2 instance. The application uses Amazon S3 for storage, Amazon SimpleDB for metadata tracking, and Amazon SQS for messaging between components. The Amazon EC2 instance is loaded with certain tools (such as Apache, PHP, PEAR, and GD) to make everything work.

Without further ado, let’s dive into the details of each step in the process and then take a look at the code.

Setting up the Amazon EC2 Instance

The remainder of this article assumes that you have signed up for Amazon EC2 and S3 services and obtained the necessary private and public keys needed to use these services. If you haven’t done this yet, please read the Amazon EC2 Getting Started Guide , which provides a step-by-step guide to getting access to Amazon EC2 and S3 services.

We chose to use the Elasticfox Firefox plugin to manage the Amazon EC2 instance. We began with the Amazon Machine Instance (AMI) ami-25b6534c, which is the basic Fedora core Linux-Apache-MySQL-PHP (LAMP) instance. To get this image up to speed, we needed to perform some updates, such as updating PHP and PEAR and enabling GD libraries in PHP.

The Les RPM de Remi has a great repository of Fedora Core 4 RPMs, compatible with our AMI. To install the Remi RPM, use the following commands:

# wget http://remi.collet.free.fr/rpms/remi-release-4.rpm
# rpm -Uvh remi-release-4.rpm

Install the remi yum repository by using the following commands:

# cd /etc/yum.repos.d
# wget http://remi.collet.free.fr/rpms/remi-fedora.repo

Update PHP and PEAR and install the GD library by using the following commands:

# yum --enablerepo=remi install php
# yum --enablerepo=remi install php-gd

Finally, you’ll need to restart the Apache web server:

# apachectl restart

Setting up the Upload Form

We set up a very simple index.php file in the web root of the Amazon EC2 instance. This index.php file contains a straightforward HTML form that allows the user to specify a file and enter his or her name and email. At the moment, the only task available is to thumbnail the uploaded image.

<html>
<body>
<h1>Thumbnailer</h1>
<form action="upload.php" method="post" enctype="multipart/form-data">
<input type="hidden" name="MAX_FILE_SIZE" value="1000000" />
<p><label for="myfile">Choose image to upload</label>
<input name="userfile" id="myfile" type="file" /></p>
<p><label for="name">Your Name</label>
<input type='text' name='fullname' id='name'/>
</p>
<p><label for="email">Your Email</label>
<input type='text' name='emailaddress' id='email'/>
</p>
<p><label for='crop'><input type='checkbox' name='task' id='thumbnail' value='thumbnail' checked='checked'/>Create Thumbnail</label></p>
<p><input type="submit" value="Upload File" /></p>
</form>
</body>
</html>

As you can see, the form isn’t complicated at all. The form action is set to upload.php, which is where we do a lot of the heavy lifting in this application.

Working with the Uploaded Files

After a file is uploaded, it must go through various steps in a certain order. Speaking in general terms, those steps are as follows:

  1. Create a random string that the application will use to uniquely identify the uploaded image.
  2. Create an Amazon S3 bucket to hold the image; use the unique identifier as the bucket name. While you’re at it, create what the necessary components for Amazon SimpleDB and SQS.
  3. Upload the file into the prepared bucket.
  4. Connect to Amazon SimpleDB.
  5. Create an Amazon SimpleDB domain.
  6. Create an entry in Amazon SimpleDB to track the uploaded image, using the random string as a key.
  7. Create an Amazon SQS queue and create a message (again, using the random string again).
  8. Send an email confirming receipt of the photo to the person who uploaded the image.
  9. Redirect the user to a thank you/confirmation page.

Before you can do any of that, you need to use require_once() to tie in some files that will help you complete your tasks. You also need to define some constants that will hold your AWS access key and secret password.

require_once 'Crypt/HMAC.php';    
require_once 'HTTP/Request.php';
require_once('class.s3.php');
require_once('simpledb.class.php');
require_once('sqs.client.php');

define('AWS_ACCESS_KEY_ID',  'change-this');
define('AWS_SECRET_ACCESS_KEY', 'change-this-too');

Three classes will help you work with Amazon S3, SimpleDB, and SQS. Each class contains methods that will help you create, update, manage, and delete information on each of the services. Instead of writing each of these functions by hand, we built upon existing classes.

  • The file class.s3.php is an Amazon S3 class file written by John Fronckowiak (one of the co-authors of this article) for an earlier Amazon S3 article.
  • The file simpledb.class.php is an Amazon SimpleDB class file written by Alex Bosworth.
  • The file sqs.client.php is an Amazon SQS class file created by Amazon.

The first step is to create a random string. This random string is important: You’re going to use it as an Amazon S3 bucket name, the key for your Amazon SimpleDB entry, and the body of the message sent through Amazon SQS.

To create this random string, you’ll use some built-in PHP functions, namely md5(), uniqid(), and rand():

$random = md5(uniqid(rand(), true));

Now that you have the random string saved as the variable $random, you can create some other constants:

define('BUCKET',$random);
define('DOMAIN','photo_jobs');
define('SQS_Q',     'photo_q');
define('SQS_ENDPOINT',     'http://queue.amazonaws.com');

 After you have a bucket, you can connect to Amazon S3, create your bucket, and then upload the file into Amazon S3. Notice that you’ll stop briefly along the way to save the file into the /tmp directory of your Amazon EC2 instance. There’s no need to keep this file there, but it is prudent to do so.

$s3 = new S3();
$s3->createBucket(BUCKET);
move_uploaded_file( $_FILES['userfile']['tmp_name'], "/tmp/".$_FILES['userfile']['name'] );
chmod( "/tmp/".$_FILES['userfile']['name'], 0777 );
$attempt = $s3->putObject( BUCKET, $_FILES['userfile']['name'], "/tmp/".$_FILES['userfile']['name'], true);

Now that the file is uploaded to Amazon S3, it’s time to connect to Amazon SimpleDB and create the domain that will hold your Amazon SimpleDB data. Please note that you are reusing the constants that contain your Amazon access key and password.

$sd = new SimpleDb(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY);
   $sd->createDomain(DOMAIN);

After the Amazon SimpleDB domain has been created, you will add the data from the form as attributes, then wrap the entire thing with the domain name and random string as unique key for the job. The result is a record of the filepath stored in Amazon S3, the person’s name and email, the task (in this case, thumbnail the image), and a preliminary status of “not started.”

$data["fullname"] = array($_POST['fullname']);
$sd->putAttributes(DOMAIN,$random,$data);
$data["email"] = array($_POST['emailaddress']);
$sd->putAttributes(DOMAIN,$random,$data);


$data["status"] = array('not started');
$sd->putAttributes(DOMAIN,$random,$data);

$data["path"] = array($_FILES['userfile']['name']);
$sd->putAttributes(DOMAIN,$random,$data);

$data["task"] = array($_POST['task']);
$sd->putAttributes(DOMAIN,$random,$data);

After the Amazon SimpleDB record has been created, it’s time to create an Amazon SQS queue and message. The message itself will be extremely simple, comprising only the random string already generated. In a few minutes, you’ll create a process.php file that will grab the next message off the queue and will process the Amazon SimpleDB entry that matches the body of the message.

For now, create the queue and send a message.

$q = new SQSClient(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, SQS_ENDPOINT);
try{
        $result = $q->CreateQueue(SQS_Q);
        //echo 'Queue Created: ', $result, "\n<br />\n";
   }catch(Exception $e){
        throw($e);
}

$messageId = $q->SendMessage(urlencode($random));

You’re almost finished with this step. All that remains is to send an email to the person who uploaded the file, and redirect that user to a thank you page:

$to = $_POST['emailaddress'];
$from = 'test@example.com';
$subj = "Your image has been uploaded!";
$msg = "You will receive an email once the file has been processed.\r\n";
$msg .= "MessageID: ". $messageId. "\r\n";
$msg .= "Bucket: ". $random;
mail($to, $subj,$msg, "From:$from\r\n");

header('Location:thanks.php');

The thank you page (thanks.php) is very simple, consisting only of HTML, but it could contain other items if you so desired:

<h1>Thanks!</h1>
   <p>You will receive an email soon with confirmation information. Shortly after that, you will receive another email when your file has been processed.</p>

Processing Files

A second file, aptly named process.php, will process any jobs. For it to be effective, we’ve set up a crontab that runs the file once every 3 minutes. Each time the file runs, it pulls one Amazon SQS message off the queue, uses the body of the message to look up an Amazon SimpleDB entry, then uses the information in the entry to figure out which file to pull from the appropriate bucket, to process the file, and to send an email to the user.

Start the file by setting up the proper require_once() lines and defining some constants:

require_once 'Crypt/HMAC.php'; 
require_once 'HTTP/Request.php'; 
require_once('class.s3.php');
require_once('simpledb.class.php');
require_once('sqs.client.php');
define('AWS_ACCESS_KEY_ID', 'change-this');
define('AWS_SECRET_ACCESS_KEY', 'change-this-too');
define('DOMAIN','photo_jobs'); //sdb
define('SQS_ENDPOINT', 'http://queue.amazonaws.com');

The first step is to log in to Amazon SQS and pull the next message off the queue. After you have that message, you can pull out the message body and the receipt handle. Notice that you’re going to wrap all the code in an if(count()) statement—if there is a next message, then keep doing…otherwise, the process will stop until the next time cron calls it.

$q = new SQSClient(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, SQS_ENDPOINT, SQS_Q);

$nextMessage = $q->ReceiveMessage(1); if (count($nextMessage)){ foreach($nextMessage as $message){      $BUCKET = urldecode($message->Body);      $handle = $message->ReceiptHandle; } //more here }

Now that you’ve successfully retrieved the message body (which contains the random string generated in the upload.php file), you can log in to Amazon SimpleDB:

$sd = new SimpleDb(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY);

You can now loop through the available domains in Amazon SimpleDB, match against the one you created to hold the photo messages, and then look for an item key that matches the body you pulled out of the Amazon SQS message. After you find a match, save the attributes into an object called $attr.

$domains = $sd->listDomains();
   foreach ($domains->ListDomainsResult as $domains){
     foreach ($domains as $id => $d_name){
       if ($d_name == DOMAIN){
         $mydomain = $sd->query($d_name);
                  
         foreach ($mydomain->QueryResult as $items){
           foreach ($items as $itemid => $_name){
             if ($_name == $BUCKET){
               $attr = $sd->getAttributes($d_name,$_name);
             }                    
           }
         }
       }
        
     }
}

It’s now a simple process of looping through the $attr object and pulling out what you need to keep working: an email address, a task, and a file path. You already know which bucket to work with; it will match the value pulled out of the Amazon SQS body.

foreach ($attr->GetAttributesResult as $attribute){
        foreach ($attribute as $array){
             
             if ($array->Name == "email"){
                  $EMAIL = $array->Value;
             }
             if ($array->Name == "path"){
                  $OBJECT = $array->Value;
             }         

				if ($array->Name == "task"){
                  $TASK = $array->Value;
             }
        }
   }

The next steps involve logging into Amazon S3, downloading the file in question by using $OBJECT and $BUCKET, creating a thumbnail with a custom function, and uploading the thumbnail back to Amazon S3, overwriting the previously uploadedfile:

$s3 = new S3();
$s3->downloadObject($BUCKET,$OBJECT,"/tmp/".$OBJECT);
createthumb('/tmp/'.$OBJECT, '/tmp/thumb_'.$OBJECT, 100,100);
$upload = $s3->putObject( $BUCKET, $OBJECT, '/tmp/thumb_'.$OBJECT, true);

The createthumb() function is a simple function that takes four arguments: file to process, output file name, width of output file, and height of output file. The function makes use of GD image-manipulation tools and is adapted from Christian Heilmann’s function available at http://icant.co.uk/articles/phpthumbnails/.

function createthumb($name,$filename,$new_w,$new_h){
        $system=explode('.',$name);
        if (preg_match('/jpg|jpeg/',$system[1])){
             $src_img=imagecreatefromjpeg($name);
        }
        if (preg_match('/png/',$system[1])){
             $src_img=imagecreatefrompng($name);
        }
        $old_x=imageSX($src_img);
        $old_y=imageSY($src_img);
        if ($old_x > $old_y) {
             $thumb_w=$new_w;
             $thumb_h=$old_y*($new_h/$old_x);
        }
        if ($old_x < $old_y) {
             $thumb_w=$old_x*($new_w/$old_y);
             $thumb_h=$new_h;
        }
        if ($old_x == $old_y) {
             $thumb_w=$new_w;
             $thumb_h=$new_h;
        }
        $dst_img=ImageCreateTrueColor($thumb_w,$thumb_h);
        imagecopyresampled($dst_img,$src_img,0,0,0,0,$thumb_w,$thumb_h,$old_x,$old_y); 
        if (preg_match("/png/",$system[1])){
             imagepng($dst_img,$filename); 
        } else {
             imagejpeg($dst_img,$filename); 
        }
        imagedestroy($dst_img); 
        imagedestroy($src_img); 
   }

The next-to-last step in the process.php file sends the owner of the file an email. Inside that email is a URL that contains a link to a file called retrieve.php.

$URL = " http://your-ec2-address.amazonaws.com/retrieve.php?b=".$BUCKET."&o=".urlencode($OBJECT);
$to = $EMAIL;
$from = 'test@example.com';
$subj = "Image thumbnail is ready!";
$msg = "Your file (". $OBJECT . ") is ready. Please go to:\r\n $URL\r\n to retrieve the image.\r\n";
mail($to, $subj,$msg, "From:$from\r\n");

The final steps in the process are to delete the Amazon SQS message that you just processed and close out the if(count()) branch.

  $q->DeleteMessage($handle);
}//end if(count($nextMessage))

Retrieving the Processed Files

The email sent to the owner of the file contains a URL that points to a file called retrieve.php. The only jobs that retrieve.php has is to take the incoming GET parameters for bucket and object, and to display the file in the browser.

Along the way, you’re going to save the bucket and object information in a PHP SESSION so that you can safely reuse it on a cleanup page.

session_start();
require_once 'Crypt/HMAC.php'; 
require_once 'HTTP/Request.php'; 
require_once('class.s3.php');
require_once('simpledb.class.php');


define('DOMAIN','photo_jobs'); //sdb
define('AWS_ACCESS_KEY_ID', 'change-this');
define('AWS_SECRET_ACCESS_KEY', 'change-this-too');
$BUCKET = $_GET['b'];
$OBJECT = $_GET['o'];


//save in session for later cleanup use
$_SESSION['o'] = $OBJECT;
$_SESSION['b'] = $BUCKET;
$s3 = new S3();
$file = $s3->getObject($BUCKET,$OBJECT);

echo '<img src="http://'.$BUCKET.'.s3.amazonaws.com/'.$OBJECT.'"/>';
echo "<p>Download the image to your desktop and then 
<a href='./cleanup.php'>finalize the clean-up process</a>."; 

After all is said and done, you can remove the bucket and object from Amazon S3 and the item from Amazon SimpleDB.

$sd = new SimpleDb(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY);
$names = array('fullname', 'email', 'status', 'path', 'task');
$sd->deleteAttributes(DOMAIN,$BUCKET, $names);

Final Cleanup

When the user clicks the “finalize the clean-up process” link, the the cleanup.php file opens. This file pulls in two PHP SESSION variables (which hold values for bucket and object) and passes those values to functions that delete the object and bucket.

session_start();
require_once 'Crypt/HMAC.php';
require_once 'HTTP/Request.php';
require_once('class.s3.php');define('AWS_ACCESS_KEY_ID', 'change-this');
define('AWS_SECRET_ACCESS_KEY', 'change-this-too');


//1. Process GET vars
$BUCKET = $_SESSION['b'];
$OBJECT = $_SESSION['o'];


//2. Delete Bucket and Object from S3
$s3 = new S3();
$s3->deleteObject($OBJECT);
$s3->deleteBucket($BUCKET);


echo "<h1>Thank you</h1>";
echo "<p>Your files have now been cleaned up.</p>";

Conclusion

This article demonstrated how developers can use Amazon EC2, S3, and SQS to build integrated and highly scalable web applications, using traditional web application development techniques. Certainly, there are many ways our image-processing application could be improved and expanded, and more features can be added. With that in mind we’ve made the entire source code available in a Google Code repository.

Learning More About AWS

This article highlights a few aspects of working with AWS. Here are a few more resources available to developers to help you learn more.

About the Authors

Tom Myer (tom@tripledogs.com) is the founder and owner of Triple Dog Dare Media, providing consulting and technical writing from his headquarters in Austin, TX. He is the author of the forthcoming WROX book Professional CodeIgniter as well as dozens of articles on PHP, web development, and small business topics.

John Fronckowiak (john@idcc.net) is the President of IDC Consulting, Inc., providing consulting and technical writing. John is also a Clinical Assistant Professor in Information Systems at the Adult Learning Program of Medaille College (http://www.medaille.edu/alp). He is also the author of several books about programming, database design and development, and networking.

Comments

Very Useful
Great writeup, helped me a lot.
madaemo on February 4, 2010 3:56 AM GMT
We are temporarily not accepting new comments.
©2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.