AWS Developer Blog

Streaming Amazon S3 Objects From a Web Server

by Michael Dowling | on | in PHP | Permalink | Comments |  Share

Have you ever needed a memory-efficient way to stream an Amazon S3 object directly from your web server to a browser? Perhaps your website has its own authorization system and you want to limit access to a file to only users who have purchased it. Or maybe you need to perform a specific action each time a file is accessed (e.g., add an image to a user’s "recently viewed" list).

Using PHP’s readfile function and the Amazon S3 stream wrapper provides a simple way to efficiently stream data from Amazon S3 to your users while proxying the bytes sent over the wire through a web server.

Register the Amazon S3 stream wrapper

First you need to create an Amazon S3 client:

use AwsS3S3Client;

$client = S3Client::factory(array(
    'key'    => '****',
    'secret' => '****'
));

Next you need to register the Amazon S3 stream wrapper:

$client->registerStreamWrapper();

Send the appropriate headers

Now you need to send the appropriate headers from the web server to the client downloading the file. You can specify completely custom headers to send to the client, including any relevant headers of the Amazon S3 object.

Here’s how you could retrieve the headers of a particular Amazon S3 object:

// Send a HEAD request to the object to get headers
$command = $client->getCommand('HeadObject', array(
    'Bucket' => 'my-bucket',
    'Key'    => 'my-images/php.gif'
));

$headers = $command->getResponse()->getHeaders();

Now that you’ve retrieved the headers of the Amazon S3 object, you can send the headers to the client that is downloading the object using PHP’s header function.

// Only forward along specific headers
$proxyHeaders = array('Last-Modified', 'ETag', 'Content-Type', 'Content-Disposition');

foreach ($proxyHeaders as $header) {
    if ($headers[$header]) {
        header("{$header}: {$headers[$header]}");
    }
}

Disable output buffering

When you use functions like echo or readfile, you might actually be writing to an output buffer. Using output buffering while streaming large files will unnecessarily consume a large amount of memory and reduce the performance of the download. You should ensure that output buffering is disabled before streaming the contents of the file.

// Stop output buffering
if (ob_get_level()) {
    ob_end_flush();
}

flush();

Send the data

Now you’re ready to stream the file using the Amazon S3 stream wrapper and the readfile function. The stream wrapper uses a syntax of "s3://[bucket]/[key]" where "[bucket]" is the name of an Amazon S3 bucket and "[key]" is the key of an object (which can contain additional "/" characters to emulate folder hierarchies).

readfile('s3://my-bucket/my-images/php.gif');

Caching

Our very simple approach to serving files from Amazon S3 does not take advantage of HTTP caching mechanisms. By implementing cache revalidation into your script, you can allow users to use a cached version of an object.

A few slight modifications to the script will allow your application to benefit from HTTP caching. By passing the ETag and Last-Modified headers from Amazon S3 to the browser, we are allowing the browser to know how to cache and revalidate the response. When a web browser has previously downloaded a file, a subsequent request to download the file will typically include cache validation headers (e.g., "If-Modified-Since", "If-None-Match"). By checking for these cache validation headers in the HTTP request sent to the PHP server, we can forward these headers along in the HEAD request sent to Amazon S3.

Here’s a complete example that will pass along cache-specific HTTP headers from the Amazon S3 object.

// Assuming the SDK was installed via Composer
require 'vendor/autoload.php';

use AwsS3S3Client;

// Create a client object
$client = S3Client::factory(array(
    'key'    => '****',
    'secret' => '****',
));

// Register the Amazon S3 stream wrapper
$client->registerStreamWrapper();

readObject($client, 'my-bucket', 'my-images/php.gif');

/**
 * Streams an object from Amazon S3 to the browser
 *
 * @param S3Client $client Client used to send requests
 * @param string   $bucket Bucket to access
 * @param string   $key    Object to stream
 */
function readObject(S3Client $client, $bucket, $key)
{
    // Begin building the options for the HeadObject request
    $options = array('Bucket' => $bucket, 'Key' => $key);

    // Check if the client sent the If-None-Match header
    if (isset($_SERVER['HTTP_IF_NONE_MATCH'])) {
        $options['IfNoneMatch'] = $_SERVER['HTTP_IF_NONE_MATCH'];
    }

    // Check if the client sent the If-Modified-Since header
    if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
        $options['IfModifiedSince'] = $_SERVER['HTTP_IF_MODIFIED_SINCE'];
    }

    // Create the HeadObject command
    $command = $client->getCommand('HeadObject', $options);

    try {
        $response = $command->getResponse();
    } catch (AwsS3ExceptionS3Exception $e) {
        // Handle 404 responses
        http_response_code(404);
        exit;
    }

    // Set the appropriate status code for the response (e.g., 200, 304)
    $statusCode = $response->getStatusCode();
    http_response_code($statusCode);

    // Let's carry some headers from the Amazon S3 object over to the web server
    $headers = $response->getHeaders();
    $proxyHeaders = array(
        'Last-Modified',
        'ETag',
        'Content-Type',
        'Content-Disposition'
    );

    foreach ($proxyHeaders as $header) {
        if ($headers[$header]) {
            header("{$header}: {$headers[$header]}");
        }
    }

    // Stop output buffering
    if (ob_get_level()) {
        ob_end_flush();
    }

    flush();

    // Only send the body if the file was not modified
    if ($statusCode == 200) {
        readfile("s3://{$bucket}/{$key}");
    }
}

Caveats

In most cases, this simple solution will work as expected. However, various software components are interacting with one another, and each component must be able to properly stream data in order to achieve optimal performance.

The PHP.net documentation for flush() provides some useful information to keep in mind when attempting to stream data from a web server to a browser:

Several servers, especially on Win32, will still buffer the output from your script until it terminates before transmitting the results to the browser. Server modules for Apache like mod_gzip may do buffering of their own that will cause flush() to not result in data being sent immediately to the client. Even the browser may buffer its input before displaying it. Netscape, for example, buffers text until it receives an end-of-line or the beginning of a tag, and it won’t render tables until the </table> tag of the outermost table is seen. Some versions of Microsoft Internet Explorer will only start to display the page after they have received 256 bytes of output, so you may need to send extra whitespace before flushing to get those browsers to display the page.