Tag: upload


Uploading Archives to Amazon Glacier from PHP

by Jeremy Lindblom | on | in PHP | Permalink | Comments |  Share

You can easily upload your data archives to Amazon Glacier by using the Glacier client included in the AWS SDK for PHP. Similar to the Amazon S3 service, Amazon Glacier has an API for both single and multipart uploads. You can upload archives of up to 40,000 GB through the multipart operations. With the UploadArchive operation, you can upload archives of up to 4 GB in a single request; however, we recommended using the multipart operations for archives larger than 100 MB.

Before we look at how to use the specific operations, let’s create a client object to work with Amazon Glacier.

use AwsGlacierGlacierClient;

$client = GlacierClient::factory(array(
    'key'    => '[aws access key]',
    'secret' => '[aws secret key]',
    'region' => '[aws region]', // (e.g., us-west-2)
));

Uploading an archive in a single request

Now let’s upload some data to your Amazon Glacier vault. For the sake of this and other code samples in this blog post, I will assume that you have already created a vault and have stored the vault name in a variable called $vaultName. I’ll also assume that the archive data you are uploading is stored in a file and that the path to that file is stored in a variable called $filename. The following code demonstrates how to use the UploadArchive operation to upload an archive in a single request.

$result = $client->uploadArchive(array(
    'vaultName' => $vaultName,
    'body'      => fopen($filename, 'r'),
));
$archiveId = $result->get('archiveId');

In this case, the SDK does some additional work for you behind the scenes. In addition to the vault name and upload body, Amazon Glacier requires that you provide the account ID of the vault owner, a SHA-256 tree hash of the upload body, and a SHA-256 content hash of the entire payload. You can manually specify these parameters if needed, but the SDK will calculate them for you if you do not explicitly provide them.

For more details about the SHA-256 tree hash and SHA-256 content hash, see the Computing Checksums section in the Amazon Glacier Developer Guide. See the GlacierClient::uploadArchive API documentation for a list of all the parameters to the UploadArchive operation.

Uploading an archive in parts

Amazon Glacier also allows you to upload archives in parts, which you can do using the multipart operations: InitiateMultipartUpload, UploadMultipartPart, CompleteMultipartUpload, and AbortMultipartUpload. The multipart operations allow you to upload parts of your archive in any order and in parallel. Also, if one part of your archive fails to upload, you only need to reupload that one part, not the entire archive.

The AWS SDK for PHP provides two different techniques for doing multipart uploads with Amazon Glacier. First, you can use the multipart operations manually, which provides the most flexibility. Second, you can use the multipart upload abstraction which allows you to configure and create a transfer object that encapsulates the multipart operations. Let’s look at the multipart abstraction first.

Using the multipart upload abstraction

The easiest way to perform a multipart upload is to use the classes provided in the AwsGlacierModelMultipartUpload namespace. The classes provide an abstraction of the multipart uploading process. The main class you interact with is UploadBuilder. The following code uses the UploadBuilder to configure a multipart upload using a part size of 4 MB. The upload() method executes the uploads and returns the result of the CompleteMultipartUpload operation at the end of the upload process.

use AwsGlacierModelMultipartUploadUploadBuilder;

$uploader = UploadBuilder::newInstance()
    ->setClient($client)
    ->setSource($filename)
    ->setVaultName($vaultName)
    ->setPartSize(4 * 1024 * 1024)
    ->build();

$result = $uploader->upload();

$archiveId = $result->get('archiveId');

Using the UploadBuilder class, you can also configure the parts to be uploaded in parallel by using the setConcurrency() method.

$uploader = UploadBuilder::newInstance()
    ->setClient($client)
    ->setSource($filename)
    ->setVaultName($vaultName)
    ->setPartSize(4 * 1024 * 1024)
    ->setConcurrency(3) // Upload 3 at a time in parallel
    ->build();

If a problem occurs during the upload process, an AwsCommonExceptionMultipartUploadException is thrown, which has access to a TransferState object that represents the state of the upload.

try {
    $result = $uploader->upload();
    $archiveId = $result->get('archiveId');
} catch (AwsCommonExceptionMultipartUploadException $e) {
    // If the upload fails, get the state of the upload
    $state = $e->getState();
}

The TransferState object can be serialized so that the upload can be completed in a separate request if needed. To resume an upload using a TransferState object, you must use the resumeFrom() method of the UploadBuilder.

$resumedUploader = UploadBuilder::newInstance()
    ->setClient($client)
    ->setSource($filename)
    ->setVaultName($vaultName)
    ->resumeFrom($state)
    ->build();

$result = $resumedUploader->upload();

Using the multipart operations

For the most flexibility, you can manage all of the upload process yourself using the individual multipart operations. The following code sample shows how to initialize an upload, upload each of the parts one by one, and then complete the upload. It also uses the UploadPartGenerator class to help calculate the information about each part. UploadPartGenerator is not required to work with the multipart operations, but it does make it much easier, especially for calculating the checksums for each of the parts and the archive as a whole.

use AwsGlacierModelMultipartUploadUploadPartGenerator;

// Use helpers in the SDK to get information about each of the parts
$archiveData = fopen($filename, 'r');
$partSize = 4 * 1024 * 1024; // (i.e., 4 MB)
$parts = UploadPartGenerator::factory($archiveData, $partSize);

// Initiate the upload and get the upload ID
$result = $client->initiateMultipartUpload(array(
    'vaultName' => $vaultName,
    'partSize'  => $partSize,
));
$uploadId = $result->get('uploadId');

// Upload each part individually using data from the part generator
foreach ($parts as $part) {
    fseek($archiveData, $part->getOffset())
    $client->uploadMultipartPart(array(
        'vaultName'     => $vaultName,
        'uploadId'      => $uploadId,
        'body'          => fread($archiveData, $part->getSize()),
        'range'         => $part->getFormattedRange(),
        'checksum'      => $part->getChecksum(),
        'ContentSHA256' => $part->getContentHash(),
    ));
}

// Complete the upload by using data aggregated by the part generator
$result = $client->completeMultipartUpload(array(
    'vaultName'   => $vaultName,
    'uploadId'    => $uploadId,
    'archiveSize' => $parts->getArchiveSize(),
    'checksum'    => $parts->getRootChecksum(),
));
$archiveId = $result->get('archiveId');

fclose($archiveData);

For more information about the various multipart operations, see the API documentation for GlacierClient. You should also take a look at the API docs for the classes in the MultipartUpload namespace to become more familiar with the multipart abstraction. We hope that this post helps you work better with Amazon Glacier and take advantage of the low-cost, long-term storage it provides.

Syncing Data with Amazon S3

by Michael Dowling | on | in PHP | Permalink | Comments |  Share

Warning: this blog post provides instructions for AWS SDK for PHP V2, if you are looking for AWS SDK for PHP V3 instructions, please see our SDK guide.

Have you ever needed to upload an entire directory of files to Amazon S3 or download an Amazon S3 bucket to a local directory? With a recent release of the AWS SDK for PHP, this is now not only possible, but really simple.

Uploading a directory to a bucket

First, let’s create a client object that we will use in each example.

use AwsS3S3Client;

$client = S3Client::factory(array(
    'key'    => 'your-aws-access-key-id',
    'secret' => 'your-aws-secret-access-key'
));

After creating a client, you can upload a local directory to an Amazon S3 bucket using the uploadDirectory() method of a client:

$client->uploadDirectory('/local/directory', 'my-bucket');

This small bit of code compares the contents of the local directory to the contents in the Amazon S3 bucket and only transfer files that have changed. While iterating over the keys in the bucket and comparing against the names of local files, the changed files are uploaded in parallel using batches of requests. When the size of a file exceeds a customizable multipart_upload_size option, the uploader automatically uploads the file using a multipart upload.

Customizing the upload sync

Plenty of options and customizations exist to make the uploadDirectory() method flexible so that it can fit many different use cases and requirements.

The following example uploads a local directory where each object is stored in the bucket using a public-read ACL, 20 requests are sent in parallel, and debug information is printed to standard output as each request is transferred.

$dir = '/local/directory';
$bucket = 'my-bucket';
$keyPrefix = '';
$options = array(
    'params'      => array('ACL' => 'public-read'),
    'concurrency' => 20,
    'debug'       => true
);

$client->uploadDirectory($dir, $bucket, $keyPrefix, $options);

By specifying $keyPrefix, you can cause the uploaded objects to be placed under a virtual folder in the Amazon S3 bucket. For example, if the $bucket name is “my-bucket” and the $keyPrefix is “testing/”, then your files will be uploaded to “my-bucket” under the “testing/” virtual folder: https://my-bucket.s3.amazonaws.com/testing/filename.txt.

You can find more documentation about uploading a directory to a bucket in the AWS SDK for PHP User Guide.

Downloading a bucket

Downloading an Amazon S3 bucket to a local directory is just as easy. We’ll again use a simple function available on an AwsS3S3Client object to easily download objects: downloadBucket().

The following example downloads all of the objects from my-bucket and stores them in /local/directory. Object keys that are under virtual subfolders are converted into a nested directory structure when the objects are downloaded.

$client->downloadBucket('/local/directory', 'my-bucket');

Customizing the download sync

Similar to the uploadDirectory() method, the downloadBucket() method has several options that can customize how files are downloaded.

The following example downloads a bucket to a local directory by downloading 20 objects in parallel and prints debug information to standard output as each transfer takes place.

$dir = '/local/directory';
$bucket = 'my-bucket';
$keyPrefix = '';

$client->downloadBucket($dir, $bucket, $keyPrefix, array(
    'concurrency' => 20,
    'debug'       => true
));

By specifying $keyPrefix, you can limit the downloaded objects to only keys that begin with the specified $keyPrefix. This can be useful for downloading objects under a virtual directory.

The downloadBucket() method also accepts an optional associative array of $options that can be used to further control the transfer. One option of note is the allow_resumable option, which allows the transfer to resume any previously interrupted downloads. This can be useful for resuming the download of a very large object so that you only need to download any remaining bytes.

You can find more documentation on syncing buckets and directories and other great Amazon S3 abstraction layers in the AWS SDK for PHP User Guide.

Transferring Files To and From Amazon S3

by Jeremy Lindblom | on | in PHP | Permalink | Comments |  Share

A common question that I’ve seen on our PHP forums is whether there is an easy way to directly upload from or download to a local file using the Amazon S3 client in the AWS SDK for PHP.

The typical usage of the PutObject operation in the PHP SDK looks like the following:

use AwsCommonAws;

$aws = Aws::factory('/path/to/your/config.php');
$s3 = $aws->get('S3');

$s3->putObject(array(
    'Bucket' => 'your-bucket-name',
    'Key'    => 'your-object-key',
    'Body'   => 'your-data'
));

The Body parameter can be a string of data, a file resource, or a Guzzle EntityBody object. To use a file resource, you could make a simple change to the previous code sample.

$s3->putObject(array(
    'Bucket' => 'your-bucket-name',
    'Key'    => 'your-object-key',
    'Body'   => fopen('/path/to/your/file.ext', 'r')
));

The SDK also provides a shortcut for uploading directly from a file using the SourceFile parameter, instead of the Body parameter.

$s3->putObject(array(
    'Bucket'     => 'your-bucket-name',
    'Key'        => 'your-object-key',
    'SourceFile' => '/path/to/your/file.ext'
));

When downloading an object via the GetObject operation, you can use the SaveAs parameter as a shortcut to save the object directly to a file.

$s3->getObject(array(
    'Bucket' => 'your-bucket-name',
    'Key'    => 'your-object-key',
    'SaveAs' => '/path/to/store/your/downloaded/file.ext'
));

The SourceFile and SaveAs parameters allow you to use the SDK to directly upload files to and download files from S3 very easily.

You can see more examples of how to use these parameters and perform other S3 operations in our user guide page for Amazon S3. Be sure to check out some of our other helpful S3 features, like our MultipartUpload helper and our S3 Stream Wrapper, which allows you to work with objects in S3 using PHP’s native file functions.