Tag: S3


AWS SDK for Go – Batch Operations with Amazon S3

The v1.9.44 release of the AWS SDK for Go adds support for batched operations in the s3manager package. This enables you to easily upload, download, and delete Amazon S3 objects. The feature uses the iterator, also known as scanner pattern, to enable users to extend the functionality of batching. This blog post shows how to use and extend the new batched operations to fit a given use case.

Deleting objects using ListObjectsIterator

  sess := session.Must(session.NewSession(&aws.Config{}))
  svc := s3.New(sess)

  input := &s3.ListObjectsInput{
    Bucket:  aws.String("bucket"),
    MaxKeys: aws.Int64(100),
  }
  // Create a delete list objects iterator
  iter := s3manager.NewDeleteListIterator(svc, input)
  // Create the BatchDelete client
  batcher := s3manager.NewBatchDeleteWithClient(svc)

  if err := batcher.Delete(aws.BackgroundContext(), iter); err != nil {
    panic(err)
  }

This example lists all objects, one hundred at a time, under the bucket passed in the command line arguments. The example above creates a new delete list iterator and dictates how the BatchDelete client behaves. This means that when we call Delete on the client it will require a BatchDeleteIterator.

Creating a custom iterator

The SDK enables you to pass custom iterators to the new batched operations. For example, if we want to upload a directory, none of the default iterators do this easily. The following example shows how to implement a custom iterator that uploads a directory to S3.

 // DirectoryIterator iterates through files and directories to be uploaded                                          
// to S3.                                                                                                               
type DirectoryIterator struct {                                                                                         
  filePaths []string                                                                                                    
  bucket    string                                                                                                      
  next      struct {                                                                                                    
    path string                                                                                                         
    f    *os.File                                                                                                       
  }                                                                                                                     
  err error                                                                                                             
}                                                                                                                       
                                                                                                                        
// NewDirectoryIterator creates and returns a new BatchUploadIterator                                                
func NewDirectoryIterator(bucket, dir string) s3manager.BatchUploadIterator {                                           
  paths := []string{}                                                                                                   
  filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {                                             
    // We care only about files, not directories                                                                     
    if !info.IsDir() {                                                                                                  
      paths = append(paths, path)                                                                                       
    }                                                                                                                   
    return nil                                                                                                          
  })                                                                                                                    
                                                                                                                        
  return &DirectoryIterator{                                                                                            
    filePaths: paths,                                                                                                   
    bucket:    bucket,                                                                                                  
  }                                                                                                                     
}                                                                                                                       
                                                                                                                        
// Next opens the next file and stops iteration if it fails to open                                             
// a file.                                                                                                              
func (iter *DirectoryIterator) Next() bool {                                                                            
  if len(iter.filePaths) == 0 {                                                                                         
    iter.next.f = nil                                                                                                   
    return false                                                                                                        
  }                                                                                                                     
                                                                                                                        
  f, err := os.Open(iter.filePaths[0])                                                                                  
  iter.err = err                                                                                                        
                                                                                                                        
  iter.next.f = f                                                                                                       
  iter.next.path = iter.filePaths[0]                                                                                    
                                                                                                                        
  iter.filePaths = iter.filePaths[1:]                                                                                   
  return true && iter.Err() == nil                                                                                      
}                                                                                                                       
                                                                                                                        
// Err returns an error that was set during opening the file
func (iter *DirectoryIterator) Err() error {                                                                            
  return iter.err                                                                                                       
}                                                                                                                       
                                                                                                                        
// UploadObject returns a BatchUploadObject and sets the After field to                                              
// close the file.                                                                                                      
func (iter *DirectoryIterator) UploadObject() s3manager.BatchUploadObject {                                             
  f := iter.next.f                                                                                                      
  return s3manager.BatchUploadObject{                                                                                   
    Object: &s3manager.UploadInput{                                                                                     
      Bucket: &iter.bucket,                                                                                             
      Key:    &iter.next.path,                                                                                          
      Body:   f,                                                                                                        
    },
	// After was introduced in version 1.10.7
    After: func() error {                                                                                               
      return f.Close()                                                                                                  
    },                                                                                                                  
  }                                                                                                                     
}

We have defined a new iterator named DirectoryIterator. This satisfies the BatchUploadIterator by defining the three necessary methods of Next, Err, and UploadObject. The Next method on the iterator will let the batch operation know to continue the iteration or not. Err returns an error if there was one. In this case, the only time we will return an error is when we fail to open a file. If this occurs, the Next method will return false. Finally, the UploadObject returns the BatchUploadObject that is used to upload contents to the service. In this example, we see that we create an input object and a closure. This closure ensures that we’re not leaking files. Now let’s define our main function using what we defined above.

func main() {
  region := os.Args[1]
  bucket := os.Args[2]
  path := os.Args[3]
  iter := NewDirectoryIterator(bucket, path)                                                                  
  uploader := s3manager.NewUploader(session.New(&aws.Config{                                                            
    Region: &region,                                                                                    
  }))                                                                                                                   
                                                                                                                        
  if err := uploader.UploadWithIterator(aws.BackgroundContext(), iter); err != nil {                                    
    panic(err)                                                                                                          
  }                                                                                                                     
  fmt.Printf("Successfully uploaded %q to %q", path, bucket)                                                                                                
}  

You can verify that the directory has been uploaded by looking in S3.

Please chat with us on gitter and file feature requests or issues in github. We look forward to your feedback and recommendations!

Uploading Files to Amazon S3

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

I blogged previously about downloading objects from Amazon S3 using the version 2 AWS SDK for Ruby. It was requested that I write about uploading objects as well.

Managed File Uploads

The simplest and most common task is upload a file from disk to a bucket in Amazon S3. This is very straightforward when using the resource interface for Amazon S3:

s3 = Aws::S3::Resource.new

s3.bucket('bucket-name').object('key').upload_file('/source/file/path')

You can pass additional options to the Resource constructor and to #upload_file. This expanded example demonstrates configuring the resource client, uploading a public object and then generating a URL that can be used to download the object from a browser.

s3 = Aws::S3::Resource.new(
  credentials: Aws::Credentials.new('akid', 'secret'),
  region: 'us-west-1'
)

obj = s3.bucket('bucket-name').object('key')
obj.upload_file('/source/file/path', acl:'public-read')
obj.public_url
#=> "https://bucket-name.s3-us-west-1.amazonaws.com/key"

This is the recommended method of using the SDK to upload files to a bucket. Using this approach has the following benefits:

  • Manages multipart uploads for objects larger than 15MB.
  • Correctly opens files in binary mode to avoid encoding issues.
  • Uses multiple threads for uploading parts of large objects in parallel.

Other Methods

In addition to Aws::S3::Object#upload_file, you can upload an object using #put or using the multipart upload APIs.

PUT Object

For smaller objects, you may choose to use #put instead. The #put method accepts an optional body, which can be a string or any IO object.

obj = s3.bucket('bucket-name').object('key')

# from a string
obj.put(body:'Hello World!')

# from an IO object
File.open('/source/file', 'rb') do |file|
  obj.put(body:file)
end

Multipart APIs

I recommend you use #upload_file whenever possible. If you need to manage large object copies, then you will need to use the multipart interfaces. There are restrictions on the minimum file, and part sizes you should be aware of. Typically these are reserved for advanced use cases.

Feedback

I’d love to hear feedback. If you find the AWS SDK for Ruby lacks a utility for working with Amazon S3, I’d love to hear about it. Please feel free to open a GitHub issue or drop into our Gitter channel.

AWS Lambda Support in Visual Studio

Today we released version 1.9.0 of the AWS Toolkit for Visual Studio with support for AWS Lambda. AWS Lambda is a new compute service in preview that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information.

Lambda functions are written in Node.js. To help Visual Studio developers, we have integrated with the Node.js Tools for Visual Studio plugin, which you can download here. Once the Node.js plugin and the latest AWS Toolkit are installed, it is easy to develop and debug locally and then deploy to AWS Lambda when you are ready. Let’s walk through the process of developing and deploying a Lambda function.

Setting up the project

To get started, we need to create a new project. There is a new AWS Lambda project template in the Visual Studio New Project dialog.

The Lambda project wizard has three ways to get started. The first option is to create a simple project that just contains the bare necessities to get started developing and testing. The second option allows you to pull down the source of a function that was already deployed. The last option allows you to create a project from a sample. For this walkthrough, select the the "Thumbnail Creator" sample and choose Finish.

Once this function is deployed, it will get called when images are uploaded to an S3 bucket. The function will then resize the image into a thumbnail, and will upload the thumbnail to another bucket. The destination bucket for the thumbnail will be the same name as the bucket containing the original image plus a "-thumbnails" suffix.

The project will be set up containing three files and the dependent Node.js packages. This sample also has a dependency on the ImageMagick CLI, which you can download from http://www.imagemagick.org/. Lambda has ImageMagick pre-configured on the compute instances that will be running the Lambda function.

Let’s take a look at the files added to the project.

app.js Defines the function that Lambda will invoke when it receives events.
_sampleEvent.json An example of what an event coming from S3 looks like.
_testdriver.js Utility code for executing the Lambda function locally. It will read in the _sampleEvent.json file and pass it into the Lambda function defined in app.js

Credentials

To access AWS resources from Lamdba, functions use the AWS SDK for Node.js which has a different path for finding credentials than the AWS SDK for .NET. The AWS SDK for Node.js looks for credentials in the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY or through the shared credentials file. For further information about configuring the AWS SDK for Node.js refer to the AWS SDK for Node.js documentation

Running locally

To run this sample, you will need to create the source and target S3 buckets. Pick a bucket name for the source bucket, and then create the bucket using AWS Explorer. Create a second bucket with the same name as the source bucket but with the "-thumbnails" suffix. For example, you could have a pair of buckets called foobar and foobar-thumbnails. Note: the _testdriver.js defaults the region to us-west-2, so be sure to update this to whatever region you create the buckets in. Once the buckets are created, upload an image to the source bucket so that you have an image to test with.

Open the _sampleEvent.js file and update the bucket name property to the source bucket and the object key property to the image that was uploaded.

Now, you can run and debug this like any other Visual Studio project. Go ahead and open up _testdriver.js and set a breakpoint and press F5 to launch the debugger.

Deploying the function to AWS Lambda

Once we have verified the function works correctly locally, it is time to deploy it. To do that, right-click on the project and select Upload to AWS Lambda….

This opens the Upload Lambda Function dialog.

You need to enter a Function Name to identify the function. You can leave the File Name and Handler fields at the default, which indicates what function to call on behalf of the event. You then need to configure an IAM role that Lambda can use to invoke your function. For this walkthrough, you are going to create a new role by selecting that we need Amazon S3 access and Amazon CloudWatch access. It is very useful to give access to CloudWatch so that Lambda can write debugging information to Amazon CloudWatch Logs and give you monitoring on the usage of the function. You can always refine these permissions after the function is uploaded. Once all that is set, go ahead and choose OK.

Once the upload is complete the Lambda Function status view will be displayed. The last step is to tell Amazon S3 to send events to your Lambda function. To do that, click the Add button for adding an event source.

Leave the Source Type set to Amazon S3 and select the Source bucket. S3 will need permission to send events to Lambda. This is done by assigning a role to the event source. By default, the dialog will create a role that gives S3 permission. Event sources to S3 are unique in that the configuration is actually done to the S3 bucket’s notification configuration. When you choose OK on this dialog, the event source will not show up here, but you can view it by right-clicking on the bucket and selecting properties.

 

Now that the function is deployed and S3 is configured to send events to our function, you can test it by uploading an image to the source bucket. Very shortly after uploading an image to the source bucket, your thumbnail will show up in the thumbnails bucket.

 

Calling from S3 Browser

Your function is set up to create thumbnails for any newly uploaded images. But what if you want to run our Lambda function on images that have already been uploaded? You can do that by opening the S3 bucket from AWS Explorer and navigating to the image you need the Lambda function to run against and choosing Invoke Lambda Function.

Next select the function we want to invoke and choose OK. The toolkit will then create the event object that S3 would have sent to Lambda and then calls Invoke on the function.

This can be done for an individual file or by selecting multiple files or folders in the S3 Browser. This is helpful when you make a code change to your Lambda function and you want to reprocess all the objects in your bucket with the new code.

Conclusion

Creating thumbnails is just one example you can use AWS Lambda for, but I’m sure you can imagine many ways you can use the power of Lambda’s event-based compute power. Currently, you can create event sources to Amazon S3, Amazon Kinesis, and Amazon DynamoDB Streams, which is currently in preview. It is also possible to invoke Lambda functions for your own custom events using any of AWS SDKs.

Try out the new Lambda features in the toolkit and let us know what you think. Given that AWS Lambda is in preview, we would love to get your feedback about these new features and what else we can add to make you successful using Lambda.

Introducing S3Link to DynamoDBContext

by Mason Schneider | on | in .NET | Permalink | Comments |  Share

S3Link has been in the AWS SDK for Java for a while now, and we have decided to introduce it to the AWS SDK for .NET as well. This feature allows you to access your Amazon S3 resources easily through a link in your Amazon DynamoDB data. S3Link can be used with minimal configuration with the .NET DynamoDB Object Persistence Model. To use S3Link, simply add it as a field to your DynamoDB annotated class and create a bucket in S3. The following Book class has an S3Link property named CoverImage.

// Create a class for DynamoDBContext
[DynamoDBTable("Library")]
public class Book
{
	[DynamoDBHashKey]   
	public int Id { get; set; }

	public S3Link CoverImage { get; set; }

	public string Title { get; set; }
	public int ISBN { get; set; }

	[DynamoDBProperty("Authors")]    
	public List BookAuthors { get; set; }
}

Now that we have an S3Link in our annotated class, we are ready to manage an S3 object. The following code does four things:

  1. Creates and saves a book to DynamoDB
  2. Uploads the cover of the book to S3
  3. Gets a pre-signed URL to the uploaded object
  4. Loads the book back in using the Context object and downloads the cover of the book to a local file
// Create a DynamoDBContext
var context = new DynamoDBContext();

// Create a book with an S3Link
Book myBook = new Book
{
	Id = 501,
	CoverImage = S3Link.Create(context, "myBucketName", "covers/AWSSDK.jpg", Amazon.RegionEndpoint.USWest2),
	Title = "AWS SDK for .NET Object Persistence Model Handling Arbitrary Data",
	ISBN = 999,
	BookAuthors = new List { "Jim", "Steve", "Pavel", "Norm", "Milind" }
};

// Save book to DynamoDB
context.Save(myBook);

// Use S3Link to upload the content to S3
myBook.CoverImage.UploadFrom("path/to/covers/AWSSDK.jpg");

// Get a pre-signed URL for the image
string coverURL = myBook.CoverImage.GetPreSignedURL(DateTime.Now.AddHours(5));

// Load book from DynamoDB
myBook = context.Load(501);

// Download file linked from S3Link
myBook.CoverImage.DownloadTo("path/to/save/cover/otherbook.jpg");

And that’s the general use for S3Link. Simply provide it a bucket and a key, and then you can upload and download your data.

Downloading Objects from Amazon S3 using the AWS SDK for Ruby

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

The AWS SDK for Ruby provides a few methods for getting objects out of Amazon S3. This blog post focuses on using the v2 Ruby SDK (the aws-sdk-core gem) to download objects from Amazon S3.

Downloading Objects into Memory

For small objects, it can be useful to get an object and have it available in your Ruby processes. If you do not specify a :target for the download, the entire object is loaded into memory into a StringIO object.

s3 = Aws::S3::Client.new
resp = s3.get_object(bucket:'bucket-name', key:'object-key')

resp.body
#=> #<StringIO ...> 

resp.body.read
#=> '...'

Call #read or #string on the StringIO to get the body as a String object.

Downloading to a File or IO Object

When downloading large objects from Amazon S3, you typically want to stream the object directly to a file on disk. This avoids loading the entire object into memory. You can specify the :target for any AWS operation as an IO object.

File.open('filename', 'wb') do |file|
  reap = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: file)
end

The #get_object method still returns a response object, but the #body member of the response will be the file object given as the :target instead of a StringIO object.

You can specify the target as String or Pathname, and the Ruby SDK will create the file for you.

resp = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: '/path/to/file')

Using Blocks

You can also use a block for downloading objects. When you pass a block to #get_object, chunks of data are yielded as they are read off the socket.

File.open('filename', 'wb') do |file|
  s3.get_object(bucket: 'bucket-name', key:'object-key') do |chunk|
    file.write(chunk)
  end
end

Please note, when using blocks to downloading objects, the Ruby SDK will NOT retry failed requests after the first chunk of data has been yielded. Doing so could cause file corruption on the client end by starting over mid-stream. For this reason, I recommend using one of the preceding methods for specifying the target file path or IO object.

Retries

The Ruby SDK retries failed requests up to 3 times by default. You can override the default using :retry_limit. Setting this value to 0 disables all retries.

If the Ruby SDK encounters a network error after the download has started, it attempts to retry request. It first checks to see if the IO target responds to #truncate. If it does not, the SDK disables retries.

If you prefer to disable this default behavior, you can either use the block mode or set :retry_limit to 0 for your S3 client.

Range GETs

For very large objects, consider using the :range option and download the object in parts. Currently there are no helper methods for this in the Ruby SDK, but if you are interested in submitting something, we accept pull requests!

Happy downloading.

Using AWS CloudTrail in PHP – Part 2

by Jeremy Lindblom | on | in PHP | Permalink | Comments |  Share

This is part 2 of Using AWS CloudTrail in PHP. Part 1 demonstrated the basics of how to work with the CloudTrail service, including how to create a trail and turn logging on and off. Today, I want to show you how to read your log files and iterate over individual log records using the AWS SDK for PHP.

AWS CloudTrail log files

CloudTrail creates JSON-formatted log files containing your AWS API call history and stores them in the Amazon S3 bucket you choose. There is no API provided by CloudTrail for reading your log files, because the log files are stored in Amazon S3. Therefore, you can use the Amazon S3 client provided by the SDK to download and read your logs.

Your log files are stored in a predictable path within your bucket based on the account ID, region, and timestamp of the API calls. Each log file contains JSON-formatted data about the API call events, including the service, operation, region, time, user agent, and request and response data. You can see a full specification of the log record data on the CloudTrail Event Reference page of the CloudTrail documentation.

Log reading tools in the SDK

Even though it is a straightforward process to get your log files from Amazon S3, the SDK provides an easier way to do it from your PHP code. As of version 2.4.12 of the SDK, you can use the LogFileIterator, LogFileReader, and LogRecordIterator classes in the AwsCloudTrail namespace to read the log files generated by your trail.

  • LogFileIterator class – Allows you to iterate over the log files generated by a trail, and can be limited by a date range. Each item yielded by the iterator contains the bucket name and object key of the log file.
  • LogFileReader class – Allows you to read the log records of a log file identified by its bucket and key.
  • LogRecordIterator class – Allows you to iterate over log records from one or more log files, and uses the other two classes.

These classes add some extra conveniences over performing the Amazon S3 operations yourself, including:

  1. Automatically determining the paths to the log files based on your criteria.
  2. The ability to fetch log files or records from a specific date range.
  3. Automatically uncompressing the log files.
  4. Extracting the log records into useful data structures.

Instantiating the LogRecordIterator

You can instantiate the LogRecordIterator using one of the three provided factory methods. Which one you choose is determined by what data is available to your application.

  • LogRecordIterator::forTrail() – Use this if the name of the bucket containing your logs is not known.
  • LogRecordIterator::forBucket() – Use this if the bucket name is known.
  • LogRecordIterator::forFile() – Use this if retrieving records from a single file. The bucket name and object key are required.

If you already know what bucket contains your log files, then you can use the forBucket() method, which requires an instance of the Amazon S3 client, the bucket name, and an optional array of options.

use AwsCloudTrailLogRecordIterator;

$records = LogRecordIterator::forBucket($s3Client, 'YOUR_BUCKET_NAME', array(
    'start_date' => '-1 day',
    'log_region' => 'us-east-1',
));

Iterate over the LogRecordIterator instance allows you to get each log record one-by-one.

foreach ($records as $record) {
    // Print the operation, service name, and timestamp of the API call
    printf(
        "Called the %s operation on %s at %s.n",
        $record['eventName'],
        $record['eventSource'],
        $record['eventTime']
    );
}

NOTE: Each record is yielded as a Guzzle Collection object, which means it behaves like an array, but returns null for non-existent keys instead triggering an error. It also has methods like getPath() and getAll() that can be useful when working with the log record data.

A complete example

Let’s say that you want to look at all of your log records generated by the Amazon EC2 service during a specific week, and count how many times each Amazon EC2 operation was used. We’ll assume that the bucket name is not known, and that the trail was created via the AWS Management Console.

If you don’t know the name of the bucket, but you do know the name of the trail, then you can use the forTrail() factory method to instantiate the iterator. This method will use the CloudTrail client and the trail name to discover what bucket the trail uses for publishing log files. Trails created via the AWS Management Console are named "Default", so if you omit trail_name from the options array, "Default" will be used as the trail_name automatically.

$records = LogRecordIterator::forTrail($s3Client, $cloudTrailClient, array(
    'start_date' => '2013-12-08T00:00Z',
    'end_date'   => '2013-12-14T23:59Z',
));

The preceding code will give you an iterator that will yield all the log records for the week of December 8, 2013. To filter by the service, we can decorate the LogRecordIterator with an instance of PHP’s very own CallbackFilterIterator class.

$records = new CallbackFilterIterator($records, function ($record) {
    return (strpos($record['eventSource'], 'ec2') !== false);
});

NOTE: CallbackFilterIterator is available only in PHP 5.4+. However, Guzzle provides a similar class (GuzzleIteratorFilterIterator) for applications running on PHP 5.3.

At this point, it is trivial to count up the operations.

$opCounts = array();
foreach ($records as $record) {
    if (isset($opCounts[$record['eventName']])) {
        $opCounts[$record['eventName']]++;
    } else {
        $opCounts[$record['eventName']] = 1;
    }
}

print_r($opCounts);

There’s a Part 3, too

In the final part of Using AWS CloudTrail in PHP, I’ll show you how to set up CloudTrail to notify you of new log files via Amazon SNS. Then I’ll use the log reading tools from today’s post, combined with the SNS Message Validator class from the SDK, to show you how to read log files as soon as they are published.

Using AWS CloudTrail in PHP – Part 1

by Jeremy Lindblom | on | in PHP | Permalink | Comments |  Share

AWS CloudTrail is a new service that was announced at AWS re:Invent 2013.

CloudTrail provides a history of AWS API calls for your account, delivered as log files to one of your Amazon S3 buckets. The AWS API call history includes API calls made via the AWS Management Console, AWS SDKs, command line interface, and higher-level AWS services like AWS CloudFormation. Using CloudTrail can help you with security analysis, resource change tracking, and compliance auditing.

Today, I want to show you how to create a trail and start logging API calls using the AWS SDK for PHP. The CloudTrail client is available as of version 2.4.10 of the SDK.

Creating a trail for logging

The easiest way to create a trail is through the AWS Management Console (see Creating and Updating Your Trail), but if you need to create a trail through your PHP code (e.g., automation), you can use the SDK.

Setting up the log file destination

CloudTrail creates JSON-formatted log files containing your AWS API call history and stores them in the Amazon S3 bucket you choose. Before you set up your trail, you must first set up an Amazon S3 bucket with an appropriate bucket policy.

First, create an Amazon S3 client object (e.g., $s3Client).

Creating the Amazon S3 bucket

Use the Amazon S3 client to create a bucket. (Remember, bucket names must be globally unique.)

$bucket = 'YOUR_BUCKET_NAME';

$s3Client->createBucket(array(
    'Bucket' => $bucket
));

$s3Client->waitUntilBucketExists(array(
    'Bucket' => $bucket
));

Creating the bucket policy

Once the bucket is available, you need to create a bucket policy. This policy should grant the the CloudTrail service the access it needs to upload log files into your bucket. The CloudTrail documentation has an example of a bucket policy that we will use in the next code example. You will need to substitute a few of your own values into the example policy including:

  • Bucket Name: The name of the Amazon S3 bucket where your log files should be delivered.
  • Account Number: This is your AWS account ID, which is the 12-digit number found on the Account Identifiers section of the AWS Security Credentials page.
  • Log File Prefix: An optional key prefix you specify when you create a trail that is prepended to the object keys of your log files.

The following code prepares the policy document and applies the policy to the bucket.

$prefix = 'YOUR_LOG_FILE_PREFIX';
$account = 'YOUR_AWS_ACCOUNT_ID';
$policy = <<<POLICY
"Version": "2012-10-17",
"Statement": [
  {
    "Sid": "AWSCloudTrailAclCheck20131101",
    "Effect": "Allow",
    "Principal": {
      "AWS":[
        "arn:aws:iam::086441151436:root",
        "arn:aws:iam::113285607260:root"
      ]
    },
    "Action": "s3:GetBucketAcl",
    "Resource": "arn:aws:s3:::{$bucket}"
  },
  {
    "Sid": "AWSCloudTrailWrite20131101",
    "Effect": "Allow",
    "Principal": {
      "AWS": [
        "arn:aws:iam::086441151436:root",
        "arn:aws:iam::113285607260:root"
      ]
    },
    "Action": "s3:PutObject",
    "Resource": "arn:aws:s3:::{$bucket}/{$prefix}/AWSLogs/{$account}/*",
    "Condition": {
      "StringEquals": {
        "s3:x-amz-acl": "bucket-owner-full-control"
      }
    }
  }
]
POLICY;

$s3Client->putBucketPolicy(array(
    'Bucket' => $bucket,
    'Policy' => $policy,
));

Creating the trail

Now that the bucket has been set up, you can create a trail. Instantiate a CloudTrail client object, then use the createTrail() method of the client to create the trail.

use AwsCloudTrailCloudTrailClient;

$cloudTrailClient = CloudTrailClient::factory(array(
    'key'    => 'YOUR_AWS_ACCESS_KEY_ID',
    'secret' => 'YOUR_AWS_SECRET_KEY',
    'region' => 'us-east-1', // or us-west-2
));

$trailName = 'YOUR_TRAIL_NAME';
$cloudTrailClient->createTrail(array(
    'Name'         => $trailName,
    'S3BucketName' => $bucket,
));

NOTE: Currently, the CloudTrail service only allows for 1 trail at a time.

Start logging

After creating a trail, you can use the SDK to turn on logging via the startLogging() method.

$cloudTrailClient->startLogging(array(
    'Name' => $trailName
));

Your log files are published to your bucket approximately every 5 minutes and contain JSON-formatted data about your AWS API calls. Log files written to your bucket will persist forever by default. However, you can alter your bucket’s lifecycle rules to automatically delete files after a certain retention period or archive them to Amazon Glacier.

Turning it off

If you want to turn off logging, you can use the stopLogging() method.

$cloudTrailClient->stopLogging(array(
    'Name' => $trailName
));

Disabling logging does not delete your trail or log files. You can resume logging by calling the startLogging() method.

In some cases (e.g., during testing) you may want to remove your trail and log files completely. You can delete your trail and bucket using the SDK as well.

Deleting the trail

To delete a trail, use the deleteTrail() method.

$cloudTrailClient->deleteTrail(array(
    'Name' => $trailName
));

Deleting your log files and bucket

To delete the log files and your bucket, you can use the Amazon S3 client.

// Delete all the files in the bucket
$s3Client->clearBucket($bucket);

// Delete the bucket
$s3Client->deleteBucket(array(
    'Bucket' => $bucket
));

Look for Part 2

In the next part of Using AWS CloudTrail in PHP, I’ll show you how you can read your log files and iterate over individual log records using the SDK.

In the meantime, check out the AWS CloudTrail User Guide to learn more about the service.

Efficient Amazon S3 Object Concatenation Using the AWS SDK for Ruby

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

Today’s post is from one of our Solutions Architects: Jonathan Desrocher, who coincidentally is also a huge fan of the AWS SDK for Ruby.


There are certain situations where we would like to take a dataset that is spread across numerous Amazon Simple Storage Service (Amazon S3) objects and represent it as a new object that is the concatenation of those S3 objects. A real-life example might be combining individual hourly log files from different servers into a single environment-wide concatenation for easier indexing and archival. Another use case would be concatenating outputs from multiple Elastic MapReduce reducers into a single task summary.

While it is possible to download and re-upload the data to S3 through an EC2 instance, a more efficient approach would be to instruct S3 to make an internal copy using the new copy_part API operation that was introduced into the SDK for Ruby in version 1.10.0.

Why upload when you can copy?

Typically, new S3 objects are created by uploading data from a client using AWS::S3::S3Object#write method or by copying the contents of an existing S3 object using the AWS::S3::Object#copy_to method of the Ruby SDK.

While the copy operation offers the advantage of offloading data transfer from the client to the S3 back-end, it is limited by its ability to only produce new objects with the exact same data as the data specified in the original. This limits the usefulness of the copy operation to those occasions where we want to preserve the data but change the object’s properties (such as key-name or storage class) as S3 objects are immutable.

In our case, we want to offload the heavy lifting of the data transfer to S3’s copy functionality, but at the same time, we need to be able to shuffle different source objects’ contents into a single target derivative—and that brings us to the Multipart Upload functionality.

Copying into a Multipart Upload

Amazon S3 offers a Multipart Upload feature that enables customers to create a new object in parts and then combine those parts into a single, coherent object.

By its own right, Multipart Upload enables us to efficiently upload large amounts of data and/or deal with an unreliable network connection (which is often the case with mobile devices) as the individual upload parts can be retried individually (thus reducing the volume of data retransmissions). Just as importantly, the individual upload parts can be uploaded in parallel, which can greatly increase the aggregated throughput of the upload (note that the same benefits also apply when using byte range GETs).

Multipart Upload can be combined with the copy functionality through the Ruby SDK’s AWS::S3::MultipartUpload#copy_part method—which results in the internal copy of the specified source object into an upload part of the Multipart Upload.

Upon the completion of the Multipart Upload job the different upload parts are combined together such that the last byte of an upload part will be immediately followed by the first byte of the subsequent part (which could be the target of a copy operation itself)— resulting in a true in-order concatenation of the specified source objects.

Code Sample

Note that this example uses Amazon EC2 roles for authenticating to S3. For more information about this feature, see our “credential management” post series.


require 'rubygems'
require 'aws-sdk'

s3 = AWS::S3.new()
mybucket = s3.buckets['my-multipart']

# First, let's start the Multipart Upload
obj_aggregate = mybucket.objects['aggregate'].multipart_upload

# Then we will copy into the Multipart Upload all of the objects in a certain S3 directory.
mybucket.objects.with_prefix('parts/').each do |source_object|

  # Skip the directory object
  unless (source_object.key == 'parts/')
    # Note that this section is thread-safe and could greatly benefit from parallel execution.
    obj_aggregate.copy_part(source_object.bucket.name + '/' + source_object.key)
  end

end

obj_completed = obj_aggregate.complete()

# Generate a signed URL to enable a trusted browser to access the new object without authenticating.
puts obj_completed.url_for(:read)

Last Notes

  • The AWS::S3::MultipartUpload#copy_part method has an optional parameter called :part_number. Omitting this parameter (as in the example above) is thread-safe. However, if multiple processes are participating in the same Multipart Upload (as in different Ruby interpreters on the same machine or different machines altogether), then the part number must be explicitly provided in order to avoid sequence collisions.
  • With the exception of the last part, there is a 5 MB minimum part size.
  • The completed Multipart Upload object is limited to a 5 TB maximum size.
  • It is possible to mix-and-match between upload parts that are copies of existing S3 objects and upload parts that are actually uploaded from the client.
  • For more information on S3 multipart upload and other cool S3 features, see the “STG303 Building scalable applications on S3” session from AWS re:Invent 2012.

Happy concatenating!

Streaming Amazon S3 Objects From a Web Server

by Michael Dowling | on | in PHP | Permalink | Comments |  Share

Have you ever needed a memory-efficient way to stream an Amazon S3 object directly from your web server to a browser? Perhaps your website has its own authorization system and you want to limit access to a file to only users who have purchased it. Or maybe you need to perform a specific action each time a file is accessed (e.g., add an image to a user’s "recently viewed" list).

Using PHP’s readfile function and the Amazon S3 stream wrapper provides a simple way to efficiently stream data from Amazon S3 to your users while proxying the bytes sent over the wire through a web server.

Register the Amazon S3 stream wrapper

First you need to create an Amazon S3 client:

use AwsS3S3Client;

$client = S3Client::factory(array(
    'key'    => '****',
    'secret' => '****'
));

Next you need to register the Amazon S3 stream wrapper:

$client->registerStreamWrapper();

Send the appropriate headers

Now you need to send the appropriate headers from the web server to the client downloading the file. You can specify completely custom headers to send to the client, including any relevant headers of the Amazon S3 object.

Here’s how you could retrieve the headers of a particular Amazon S3 object:

// Send a HEAD request to the object to get headers
$command = $client->getCommand('HeadObject', array(
    'Bucket' => 'my-bucket',
    'Key'    => 'my-images/php.gif'
));

$headers = $command->getResponse()->getHeaders();

Now that you’ve retrieved the headers of the Amazon S3 object, you can send the headers to the client that is downloading the object using PHP’s header function.

// Only forward along specific headers
$proxyHeaders = array('Last-Modified', 'ETag', 'Content-Type', 'Content-Disposition');

foreach ($proxyHeaders as $header) {
    if ($headers[$header]) {
        header("{$header}: {$headers[$header]}");
    }
}

Disable output buffering

When you use functions like echo or readfile, you might actually be writing to an output buffer. Using output buffering while streaming large files will unnecessarily consume a large amount of memory and reduce the performance of the download. You should ensure that output buffering is disabled before streaming the contents of the file.

// Stop output buffering
if (ob_get_level()) {
    ob_end_flush();
}

flush();

Send the data

Now you’re ready to stream the file using the Amazon S3 stream wrapper and the readfile function. The stream wrapper uses a syntax of "s3://[bucket]/[key]" where "[bucket]" is the name of an Amazon S3 bucket and "[key]" is the key of an object (which can contain additional "/" characters to emulate folder hierarchies).

readfile('s3://my-bucket/my-images/php.gif');

Caching

Our very simple approach to serving files from Amazon S3 does not take advantage of HTTP caching mechanisms. By implementing cache revalidation into your script, you can allow users to use a cached version of an object.

A few slight modifications to the script will allow your application to benefit from HTTP caching. By passing the ETag and Last-Modified headers from Amazon S3 to the browser, we are allowing the browser to know how to cache and revalidate the response. When a web browser has previously downloaded a file, a subsequent request to download the file will typically include cache validation headers (e.g., "If-Modified-Since", "If-None-Match"). By checking for these cache validation headers in the HTTP request sent to the PHP server, we can forward these headers along in the HEAD request sent to Amazon S3.

Here’s a complete example that will pass along cache-specific HTTP headers from the Amazon S3 object.

// Assuming the SDK was installed via Composer
require 'vendor/autoload.php';

use AwsS3S3Client;

// Create a client object
$client = S3Client::factory(array(
    'key'    => '****',
    'secret' => '****',
));

// Register the Amazon S3 stream wrapper
$client->registerStreamWrapper();

readObject($client, 'my-bucket', 'my-images/php.gif');

/**
 * Streams an object from Amazon S3 to the browser
 *
 * @param S3Client $client Client used to send requests
 * @param string   $bucket Bucket to access
 * @param string   $key    Object to stream
 */
function readObject(S3Client $client, $bucket, $key)
{
    // Begin building the options for the HeadObject request
    $options = array('Bucket' => $bucket, 'Key' => $key);

    // Check if the client sent the If-None-Match header
    if (isset($_SERVER['HTTP_IF_NONE_MATCH'])) {
        $options['IfNoneMatch'] = $_SERVER['HTTP_IF_NONE_MATCH'];
    }

    // Check if the client sent the If-Modified-Since header
    if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
        $options['IfModifiedSince'] = $_SERVER['HTTP_IF_MODIFIED_SINCE'];
    }

    // Create the HeadObject command
    $command = $client->getCommand('HeadObject', $options);

    try {
        $response = $command->getResponse();
    } catch (AwsS3ExceptionS3Exception $e) {
        // Handle 404 responses
        http_response_code(404);
        exit;
    }

    // Set the appropriate status code for the response (e.g., 200, 304)
    $statusCode = $response->getStatusCode();
    http_response_code($statusCode);

    // Let's carry some headers from the Amazon S3 object over to the web server
    $headers = $response->getHeaders();
    $proxyHeaders = array(
        'Last-Modified',
        'ETag',
        'Content-Type',
        'Content-Disposition'
    );

    foreach ($proxyHeaders as $header) {
        if ($headers[$header]) {
            header("{$header}: {$headers[$header]}");
        }
    }

    // Stop output buffering
    if (ob_get_level()) {
        ob_end_flush();
    }

    flush();

    // Only send the body if the file was not modified
    if ($statusCode == 200) {
        readfile("s3://{$bucket}/{$key}");
    }
}

Caveats

In most cases, this simple solution will work as expected. However, various software components are interacting with one another, and each component must be able to properly stream data in order to achieve optimal performance.

The PHP.net documentation for flush() provides some useful information to keep in mind when attempting to stream data from a web server to a browser:

Several servers, especially on Win32, will still buffer the output from your script until it terminates before transmitting the results to the browser. Server modules for Apache like mod_gzip may do buffering of their own that will cause flush() to not result in data being sent immediately to the client. Even the browser may buffer its input before displaying it. Netscape, for example, buffers text until it receives an end-of-line or the beginning of a tag, and it won’t render tables until the </table> tag of the outermost table is seen. Some versions of Microsoft Internet Explorer will only start to display the page after they have received 256 bytes of output, so you may need to send extra whitespace before flushing to get those browsers to display the page.

Uploading to Amazon S3 with HTTP POST using the AWS SDK for .NET

by Norm Johanson | on | in .NET | Permalink | Comments |  Share

Generally speaking, access to your Amazon S3 resources requires your AWS credentials, though there are situations where you would like to grant certain forms of limited access to other users. For example, to allow users temporary access to download a non-public object, you can generate a pre-signed URL.

Another common situation is where you want to give users the ability to upload multiple files over time to an S3 bucket, but you don’t want to make the bucket public. You might also want to set some limits on what type and/or size of files users can upload. For this case, S3 allows you to create an upload policy that describes what a third-party user is allowed to upload, sign that policy with your AWS credentials, then give the user the signed policy so that they can use it in combination with HTTP POST uploads to S3.

The AWS SDK for .NET comes with some utilities that make this easy.

Writing an Upload Policy

First, you need to create the upload policy, which is a JSON document that describes the limitations Amazon S3 will enforce on uploads. This policy is different from an Identity and Access Management policy.

Here is a sample upload policy that specifies

  • The S3 bucket must be the-s3-bucket-in-question
  • Object keys must begin with donny/uploads/
  • The S3 canned ACL must be private
  • Only text files can be uploaded
  • The POST must have an x-amz-meta-yourelement specified, but it can contain anything.
  • Uploaded files cannot be longer than a megabyte.
{"expiration": "2013-04-01T00:00:00Z",
  "conditions": [ 
    {"bucket": "the-s3-bucket-in-question"}, 
    ["starts-with", "$key", "donny/uploads/"],
    {"acl": "private"},
    ["eq", "$Content-Type", "text/plain"],
    ["starts-with", "x-amz-meta-yourelement", ""],
    ["content-length-range", 0, 1048576]
  ]
}

It’s a good idea to place as many limitations as you can on these policies. For example, make the expiration as short as reasonable, restrict separate users to separate key prefixes if using the same bucket, and constrain file sizes and types. For more information about policy construction, see the Amazon Simple Storage Service Developer Guide.

 

Signing a Policy

Once you have a policy, you can sign it with your credentials using the SDK.

using Amazon.S3.Util;
using Amazon.Runtime;

var myCredentials = new BasicAWSCredentials(ACCESS_KEY_ID, SECRET_ACCESS_KEY);
var signedPolicy = S3PostUploadSignedPolicy.GetSignedPolicy(policyString, myCredentials);

Ideally, the credentials used to sign the request would belong to an IAM user created for this purpose, and not your root account credentials. This allows you to further constrain access with IAM policies, and it also gives you an avenue to revoke the signed policy (by rotating the credentials of the IAM user).

In order to successfully sign POST upload policies, the IAM user permissions must allow the actions s3:PutObject and s3:PutObjectAcl.

Uploading an Object Using the Signed Policy

You can add this signed policy object to an S3PostUploadRequest.

var postRequest = new S3PostUploadRequest 
{
    Key = "donny/uploads/throwing_rocks.txt",
    Bucket = "the-s3-bucket-in-question",
    CannedACL = S3CannedACL.Private,
    InputStream = File.OpenRead("c:throwing_rocks.txt"),
    SignedPolicy = signedPolicy
};

postRequest.Metadata.Add("yourelement", myelement);

var response = AmazonS3Util.PostUpload(postRequest);

Keys added to the S3PostUploadRequest.Metadata dictionary will have the x-amz-meta- prefix added to them if it isn’t present. Also, you don’t always have to explicitly set the Content-Type if it can be inferred from the extension of the file or key.

Any errors returned by the service will result in an S3PostUploadException, which will contain an explanation of why the upload failed.

 

Exporting and Importing a Signed Policy

You can export the S3PostUploadSignedPolicy object to JSON or XML to be transferred to other users.

var policyJson = signedPolicy.ToJson();
var policyXml = signedPolicy.ToXml();

And the receiving user can re-create S3PostUploadSignedPolicy objects with serialized data.

var signedPolicy = S3PostUploadSignedPolicy.GetSignedPolicyFromJson(policyJson);
vat signedPolicy2 = S3PostUploadSignedPolicy.GetSignedPolicyFromXml(policyXML);

For more information about uploading objects to Amazon S3 with HTTP POST, including how to upload objects with a web browser, see the Amazon Simple Storage Service Developer Guide.