AWS Developer Blog

Syncing Data with Amazon S3

by Michael Dowling | on | in PHP | Permalink | Comments |  Share

Warning: this blog post provides instructions for AWS SDK for PHP V2, if you are looking for AWS SDK for PHP V3 instructions, please see our SDK guide.

Have you ever needed to upload an entire directory of files to Amazon S3 or download an Amazon S3 bucket to a local directory? With a recent release of the AWS SDK for PHP, this is now not only possible, but really simple.

Uploading a directory to a bucket

First, let’s create a client object that we will use in each example.

use AwsS3S3Client;

$client = S3Client::factory(array(
    'key'    => 'your-aws-access-key-id',
    'secret' => 'your-aws-secret-access-key'
));

After creating a client, you can upload a local directory to an Amazon S3 bucket using the uploadDirectory() method of a client:

$client->uploadDirectory('/local/directory', 'my-bucket');

This small bit of code compares the contents of the local directory to the contents in the Amazon S3 bucket and only transfer files that have changed. While iterating over the keys in the bucket and comparing against the names of local files, the changed files are uploaded in parallel using batches of requests. When the size of a file exceeds a customizable multipart_upload_size option, the uploader automatically uploads the file using a multipart upload.

Customizing the upload sync

Plenty of options and customizations exist to make the uploadDirectory() method flexible so that it can fit many different use cases and requirements.

The following example uploads a local directory where each object is stored in the bucket using a public-read ACL, 20 requests are sent in parallel, and debug information is printed to standard output as each request is transferred.

$dir = '/local/directory';
$bucket = 'my-bucket';
$keyPrefix = '';
$options = array(
    'params'      => array('ACL' => 'public-read'),
    'concurrency' => 20,
    'debug'       => true
);

$client->uploadDirectory($dir, $bucket, $keyPrefix, $options);

By specifying $keyPrefix, you can cause the uploaded objects to be placed under a virtual folder in the Amazon S3 bucket. For example, if the $bucket name is “my-bucket” and the $keyPrefix is “testing/”, then your files will be uploaded to “my-bucket” under the “testing/” virtual folder: https://my-bucket.s3.amazonaws.com/testing/filename.txt.

You can find more documentation about uploading a directory to a bucket in the AWS SDK for PHP User Guide.

Downloading a bucket

Downloading an Amazon S3 bucket to a local directory is just as easy. We’ll again use a simple function available on an AwsS3S3Client object to easily download objects: downloadBucket().

The following example downloads all of the objects from my-bucket and stores them in /local/directory. Object keys that are under virtual subfolders are converted into a nested directory structure when the objects are downloaded.

$client->downloadBucket('/local/directory', 'my-bucket');

Customizing the download sync

Similar to the uploadDirectory() method, the downloadBucket() method has several options that can customize how files are downloaded.

The following example downloads a bucket to a local directory by downloading 20 objects in parallel and prints debug information to standard output as each transfer takes place.

$dir = '/local/directory';
$bucket = 'my-bucket';
$keyPrefix = '';

$client->downloadBucket($dir, $bucket, $keyPrefix, array(
    'concurrency' => 20,
    'debug'       => true
));

By specifying $keyPrefix, you can limit the downloaded objects to only keys that begin with the specified $keyPrefix. This can be useful for downloading objects under a virtual directory.

The downloadBucket() method also accepts an optional associative array of $options that can be used to further control the transfer. One option of note is the allow_resumable option, which allows the transfer to resume any previously interrupted downloads. This can be useful for resuming the download of a very large object so that you only need to download any remaining bytes.

You can find more documentation on syncing buckets and directories and other great Amazon S3 abstraction layers in the AWS SDK for PHP User Guide.

Getting the Latest Windows AMIs

by Steve Roberts | on | in .NET | Permalink | Comments |  Share

More and more developers are launching the AWS base Windows AMIs and configuring them during startup. You can do this either by adding a PowerShell script to the user data field or by using an AWS CloudFormation template to configure it. We are constantly updating these base AMIs to include the latest patches. The SDK contains the ImageUtilities class, which you can find in the Amazon.EC2.Util namespace. This class is useful for finding the latest AMIs using named service pack/’RTM’ independent constants. For example, the following code will find the latest Windows 2012 with SQL Server Express AMI:

Image image = ImageUtilities.FindImage(ec2Client, ImageUtilities.WINDOWS_2012_SQL_SERVER_EXPRESS_2012);

Using the version-independent constants means that you do not need to rebuild your code when the Amazon EC2 team revises the published AMIs. The new EC2 sample that was recently added to Visual Studio under ”Compute and Networking” demonstrates how to use the ImageUtilities class and execute a PowerShell script at startup. Using AWS Tools for PowerShell you can also use the Get-EC2ImageByName cmdlet:

"WINDOWS_2012_SQL_SERVER_EXPRESS_2012" | Get-EC2ImageByName | New-EC2Instance ...

The cmdlet accepts either the logical, service pack/RTM-independent names or specific name patterns. The current names can be seen by invoking the cmdlet with no parameters. Just like using the SDK, if you script using logical names to address the AMIs, your script does not need to be updated when Amazon EC2 revises the current AMIs as new service packs are released!

Static Service Client Facades

by Jeremy Lindblom | on | in PHP | Permalink | Comments |  Share

Version 2.4 of the AWS SDK for PHP adds the ability to enable and use static client facades. These "facades" provide an easy, static interface to service clients available in the service builder. For example, when working with a normal client instance, you might have code that looks like the following:

// Get the configured S3 client from the service builder
$s3 = $aws->get('s3');

// Execute the CreateBucket command using the S3 client
$s3->createBucket(array('Bucket' => 'your-new-bucket-name'));

With client facades enabled, you can also accomplish this with the following code:

// Execute the CreateBucket command using the S3 client
S3::createBucket(array('Bucket' => 'your-new-bucket-name'));

Enabling and using client facades

To enable static client facades to be used in your application, you must use the AwsCommonAws::enableFacades method when you setup the service builder.

// Include the Composer autoloader
require 'vendor/autoload.php';

// Instantiate the SDK service builder with my config and enable facades
$aws = Aws::factory('/path/to/my_config.php')->enableFacades();

This will setup the client facades and alias them into the global namespace. After that, you can use them anywhere to have more simple and expressive code for interacting with AWS services.

// List current buckets
echo "Current Buckets:n";
foreach (S3::getListBucketsIterator() as $bucket) {
    echo "{$bucket['Name']}n";
}

$args = array('Bucket' => 'your-new-bucket-name');
$file = '/path/to/the/file/to/upload.jpg';

// Create a new bucket and wait until it is available for uploads
S3::createBucket($args) and S3::waitUntilBucketExists($args);
echo "nCreated a new bucket: {$args['Bucket']}.n";

// Upload a file to the new bucket
$result = S3::putObject($args + array(
    'Key'  => basename($file),
    'Body' => fopen($file, 'r'),
));
echo "nCreated a new object: {$result['ObjectURL']}n";

You can also mount the facades into a namespace other than the global namespace. For example, if you want to make the client facades available in the "Services" namespace, you can do the following:

Aws::factory('/path/to/my_config.php')->enableFacades('Services');

$result = ServicesDynamoDb::listTables();

Why use client facades?

The use of static client facades is completely optional. We included this feature in the SDK in order to appeal to PHP developers who prefer static notation or who are familiar with PHP frameworks like CodeIgnitor, Laravel, or Kohana where this style of method invocation is common.

Though using static client facades has little real benefit over using client instances, it can make your code more concise and prevent you from having to inject the service builder or client instance into the context where you need the client object. This can make your code easier to write and understand. Whether or not you should use the client facades is purely a matter of preference.

How client facades work in the AWS SDK for PHP is similar to how facades work in the Laravel 4 Framework. Even though you are calling static classes, all of the method calls are proxied to method calls on actual client instances—the ones stored in the service builder. This means that the usage of the clients via the client facades can still be mocked in your unit tests, which removes one of the general disadvantages to using static classes in object-oriented programming. For information about how to test code that uses client facades, please see the Testing Code that Uses Client Facades section of the AWS SDK for PHP User Guide.

Though we are happy to offer this new feature, we we don’t expect you to change all of your code to use the static client facades. We are simply offering it as an alternative that may be more convenient or familiar to you. We still recommend using client instances as you have in the past and support the use of dependency injection. Be sure to let us know in the comments if you like this new feature and if you plan on using it.

A New Addition to the AWS SDK for Ruby

by Loren Segal | on | in Ruby | Permalink | Comments |  Share

Last week we quietly welcomed a new addition to the AWS SDK for Ruby organization. We’re proud to publicly announce that Alex Wood has joined our team and is now a core contributor to the Ruby SDK, as well as some of our other ongoing Ruby-based projects. He’s already jumped in on GitHub where he has helped us close a bunch of open issues on the SDK. We expect him to get more involved in the development of the SDK as well as helping out on our forums, blogs, and other public pages as time goes on. So if you see a new face helping you out by the handle awood45 on GitHub, or @alexwwood on Twitter, make sure to give him a warm welcome! You might even want to pass on a congratulations or two, as he managed to start a new job and get married, all in the span of one week!

Data Encryption with Amazon S3

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

If your application uses Amazon S3 and deals with sensitive data, then you should be taking advantage of the easy ways of increasing the security of your data using the AWS SDK for Java.

There are two easy options for locking down your data using Amazon S3 and the AWS SDK for Java. Which one you choose depends on the nature of your data and how much you want to be involved with the encryption process and key management. Both options give you solutions for ensuring your data is securely stored in Amazon S3.

Server-Side Encryption

Server-side data encryption with Amazon S3 is the easiest of the two options, and requires extremely little work to enable. All you need to do is enable server-side encryption in your object metadata when you upload your data to Amazon S3. As soon as your data reaches S3, it is encrypted and stored. When you request your data again, Amazon S3 automatically decrypts it as it’s streamed back to you. Your data is always encrypted when it’s stored in Amazon S3, with encryption keys managed by Amazon. This makes it incredibly easy to start using encryption, since your application doesn’t have to do anything other than set the server-side encryption flag when you upload your data.

The example below shows how to create a request to upload data to Amazon S3, then call the ObjectMetadata#setServerSideEncryption() method and specify the encryption algorithm (currently ObjectMetadata.html#AES_256_SERVER_SIDE_ENCRYPTION is the only supported encryption algorithm).

PutObjectRequest request = new PutObjectRequest(bucket, key, file);
            
// Request server-side encryption.
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.setServerSideEncryption(
                     ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION);     
request.setMetadata(objectMetadata);

s3client.putObject(request);

If you want to convert existing data stored in Amazon S3 to use server-side encryption, you can use the AmazonS3#copyObject() method to edit the object’s metadata (essentially you’re copying the object to the same location, and supplying new object metadata).

CopyObjectRequest request = new CopyObjectRequest(bucket, key, bucket, key);
            
// Request server-side encryption.
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.setServerSideEncryption(
          ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION); 
            
request.setNewObjectMetadata(objectMetadata);
         
s3client.copyObject(request);

Client-Side Encryption

The second option for encrypting your sensitive data is to use the client-side encryption provided by the AWS SDK for Java. This option is a little more involved on your part, but can provide stronger security, since your data never leaves your machine in an unencrypted form and you manage the encryption keys.

To use client-side encryption for your Amazon S3 data, the first thing you need to do is switch to using the AmazonS3EncryptionClient instead of the basic AmazonS3Client. The encryption client’s interface is identical to the standard client’s interface, so switching to the new client in existing code is very straightforward. The encryption client handles automatically encrypting your data as it’s streamed to Amazon S3, and automatically decrypts your data as it streams back to your application from Amazon S3 when you download objects.

The major difference between server-side encryption and client-side encryption is who manages the encryption keys. With client-side encryption, you need to provide the AmazonS3EncryptionClient with EncryptionMaterials, which drive the encryption process.

The example below demonstrates how to create an instance of the encryption client and use it to encrypt and then decrypt data. It shows how to generate an RSA asymmetric key pair, but in a real application, you’d probably load your encryption keys from disk.

// Several good online sources explain how to easily create an RSA key pair
// from the command line using OpenSSL, for example:
// http://en.wikibooks.org/wiki/Transwiki:Generate_a_keypair_using_OpenSSL
KeyPairGenerator keyGenerator = KeyPairGenerator.getInstance("RSA");
keyGenerator.initialize(1024, new SecureRandom());
KeyPair myKeyPair = keyGenerator.generateKeyPair();

// Construct an instance of AmazonS3EncryptionClient
EncryptionMaterials encryptionMaterials = new EncryptionMaterials(myKeyPair);
AmazonS3EncryptionClient s3 = new AmazonS3EncryptionClient(credentials, encryptionMaterials);

// Then just use the encryption client like the standard S3 client
s3.putObject(bucket, key, myFile);

// When you use the getObject method, the data retrieved from Amazon S3
// is automatically decrypted on the fly as it streams down to you
S3Object downloadedObject = s3.getObject(bucketName, key);

For a much more in depth guide on how the client-side data encryption for Amazon S3 works, including how to extend it to integrate with existing private key management systems, see our article on Client-Side Data Encryption with the AWS SDK for Java and Amazon S3.

Are you already using either of these encryption features in the AWS SDK for Java? Let us know in the comments below!

Creating Access Policies in Code

by Norm Johanson | on | in .NET | Permalink | Comments |  Share

AWS uses access policies to restrict access to resources. These policies are JSON documents that have statements, actions, resources, and conditions. You could use a JSON parser to create these documents in code, but a better way would be to use the AWS SDK for .NET Policy object found in the Amazon.Auth.AccessControlPolicy namespace. This gives you type safety and is much more readable than writing text to a JSON parser. For example, imagine you are building a system where desktop clients upload user data directly to Amazon S3 instead of uploading to a web server that would do the upload to S3. You don’t want to bundle the credentials to get to S3, so the desktop clients need to get their credentials from a web server. You want the clients to be able to do GET and PUT requests in S3 under their username in a specific bucket.

The following code creates the policy object. For this case, you need only one statement. It has a resource of bucket + username and the GET and PUT actions. As an added security measure, let’s add a condition that locks the GET and PUT request to the IP address of the desktop client.

public Policy GeneratePolicy(string bucket, string username, string ipAddress)
{
    var statement = new Statement(Statement.StatementEffect.Allow);

    // Allow access to the sub folder represented by the username in the bucket
    statement.Resources.Add(ResourceFactory.NewS3ObjectResource(bucket, username + "/*"));

    // Allow Get and Put object requests.
    statement.Actions = new List() 
        { S3ActionIdentifiers.GetObject,  S3ActionIdentifiers.PutObject };

    // Lock the requests coming from the client machine.
    statement.Conditions.Add(ConditionFactory.NewIpAddressCondition(ipAddress));

    var policy = new Policy();
    policy.Statements.Add(statement);

    return policy;
}

Once you have the policy, you can create a federated user, like this:

public Credentials GetFederatedCredentials(Policy policy, string username)
{
    var request = new GetFederationTokenRequest()
    {
        Name = username,
        Policy = policy.ToJson() 
    };

    var stsClient = new AmazonSecurityTokenServiceClient();

    var response = stsClient.GetFederationToken(request);
    return response.GetFederationTokenResult.Credentials;
}

The credentials object contains a temporary access key, a secret key, and a session token that can be sent back to the desktop client. The desktop client can then construct an S3 client, like this:

string accessKeyId, secretAccessKey, sessionToken;
GetCredentialsFromWebServer(out accessKeyId, out secretAccessKey, out sessionToken);

var sessionCredentials = new SessionAWSCredentials(accessKeyId, secretAccessKey, sessionToken);
AmazonS3 s3Client = new AmazonS3Client(sessionCredentials, RegionEndpoint.USWest2);

public void GetCredentialsFromWebServer(out string accessKeyId, out string secretAccessKey, out string sessionToken)
{
    ... Make web request to get temporary restricted credentials. ...
}

Now the desktop client can upload and download data to S3 without the ability to access other user’s data.

Using the AWS SDK for Ruby from Your REPL

by Loren Segal | on | in Ruby | Permalink | Comments |  Share

We are all used to spinning up irb or Pry sessions to play with Ruby’s features interactively. Some people reading this might even be using the rails console on a daily basis, which can make digging through Ruby on Rails applications much easier. Well, we’re actually working on bringing that same functionality into the AWS SDK for Ruby!

The Backstory

We’ve been using an internal version of our interactive console for a long time in order to more easily develop and debug new features in the Ruby SDK, but it’s always been a very homegrown and customized tool. We didn’t feel that it was in the right state to be published along with the SDK, but we were also looking at extracting this out as a public executable in the aws-sdk gem that we are comfortable supporting.

And then a couple of weeks ago, a developer by the name of Mike Williams (@woollyams on Twitter) posted a Gist that showed how to launch a REPL for the Ruby SDK using Pry, which got us thinking about (and working on) extracting our REPL a little bit more.

Tweet about the Pry REPL (@woollyams)

Thanks, Mike, for indirectly helping to move this forward!

Introducing the REPL

Trevor took the above Gist and did some refactoring to make it work with other services, as well as play more nicely with some of the new convenience features in the Ruby SDK. The end result is now sitting in a branch on the aws/aws-sdk-ruby repository (aws-sdk-repl). You can try the REPL out for yourself by checking out the repository and running ./bin/aws-rb:

$ git clone git://github.com/aws/aws-sdk-ruby
$ cd aws-sdk-ruby
$ git checkout aws-sdk-repl
$ ./bin/aws-rb --help
Usage: aws-rb [options]
        --repl REPL                  specify the repl environment, pry or irb
    -l, --[no-]log                   log client requets, on by default
    -c, --[no-]color                 colorize request logging, on by default
    -d, --[no-]debug                 log HTTP wire traces, off by default
    -Idirectory                      specify $LOAD_PATH directory (may be used more than once)
    -rlibrary                        require the library
    -v, --verbose                    enable client logging and HTTP wire tracing
    -q, --quiet                      disable client logging and HTTP wire tracing
    -h, --help

The Features

Pry by default

The tool currently attempts to use Pry by default, if available, and falls back to a plain old irb session, if it is not. If you want to stick to irb or Pry, pass --repl irb or --repl pry respectively. You can also set this through environment variables (more information on this is discussed in the pull request).

Logging interactively

Also by default, we show simple logging for each request that the SDK sends. This can shed a lot of light onto how you are using the SDK. For example, when you list buckets from S3, you might see:

AWS> s3.buckets.map(&:name)
[AWS S3 200 0.933647 0 retries] list_buckets()  
=> ["mybucket1", "mybucket2", "mybucket3", ...]

You can also show HTTP wire traces by passing -d to the console to run in debug mode. These values can all also be set through environment variables.

Logging existing scripts

Finally, if you have a small script that you want to debug or profile, you can use the aws-rb shell to quickly log all requests and ensure that the right things are happening in that script:

$ cat test.rb
require 'aws-sdk'
AWS.s3.buckets.to_a
AWS.sqs.queues.to_a
$ ./bin/aws-rb -d -I . -r test.rb
<...WIRE TRACE DATA HERE...>
[AWS S3 200 1.183635 0 retries] list_buckets()  
<...WIRE TRACE DATA HERE...>
[AWS SQS 200 0.836059 0 retries] list_queues()  

Looks like we got the requests we were looking for!

Making It Live

We are currently crossing our t’s and dotting our i’s on this new feature, but if you have feedback that you would like to get in on this new REPL, feel free to jump in on the pull request #270 to comment. We’d love to hear anything you have to say, from any large feature omissions all the way down to suggestions for the executable name. Please join in on the conversation at GitHub.

Rate-Limited Scans in Amazon DynamoDB

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

Today we’re lucky to have another guest post by David Yanacek from the Amazon DynamoDB team. David is sharing his deep knowledge on Amazon DynamoDB to help explain how to manage performance and throughput usage on your DynamoDB tables.


When you scan your table in Amazon DynamoDB, you should follow the DynamoDB best practices for avoiding sudden bursts of read activity. You may also want to limit a background Scan job to use a limited amount of your table’s provisioned throughput, so that it doesn’t interfere with your more important operations. Fortunately, the Google Guava libraries for Java include a RateLimiter class, which makes it easy to limit the amount of provisioned throughput you use.

Let’s say that you have an application that scans a DynamoDB table once a day in order to produce reports, take a backup, compute aggregates, or do something else that involves scanning the whole table. It’s worth pointing out that Amazon DynamoDB is also integrated with Amazon Elastic Map Reduce, and with Amazon Redshift. These integrations let you export your tables to other locations like Amazon S3, or to perform complex analytics and queries that DynamoDB does not natively support. However, it’s also common to do this sort of scan activity in the application instead of using EMR or Redshift, so let’s go into the best practices for doing this scan without interfering with the rest of the application.

To illustrate, let’s say that you have a table that is 50 GB in size and is provisioned with 10,000 read capacity units per second. Assume that you will perform this scan at night when normal traffic to your table consumes only 5,000 read capacity units per second. This gives you plenty of extra provisioned throughput for scanning your table, but you still don’t want it to interfere with your normal workload. If you allow your scan to consume 2,000 read capacity units, it will take about an hour to complete the scan, according to following calculation:

50 GB requires 6,553,600 read capacity units to scan, which is 50 (GB) * 1024 (MB / GB) * 1024 (KB / MB) / 2 (Scan performs eventually consistent reads, which are half the cost.) / 4 (Each 4 KB of data consumes 1 read capacity unit.) . 2,000 read capacity units per second yields 7,200,000 per hour. So 6,553,600 / 7,200,000 is equal to 0.91 hours, or about 55 minutes.

To make the most of your table’s provisioned throughput, you’ll want to use the Parallel Scan API operation so that your scan is distributed across your table’s partitions. But be careful that your scan doesn’t consume your table’s provisioned throughput and cause the critical parts of your application to be throttled. To avoid throttling, you need to rate limit your client application—something Guava’s RateLimiter class makes easy:

// Initialize the rate limiter to allow 25 read capacity units / sec
RateLimiter rateLimiter = RateLimiter.create(25.0);

// Track how much throughput we consume on each page 
int permitsToConsume = 1;

// Initialize the pagination token 
Map<String, AttributeValue> exclusiveStartKey = null;

do {
    // Let the rate limiter wait until our desired throughput "recharges"
    rateLimiter.acquire(permitsToConsume);
    
    // Do the scan
    ScanRequest scan = new ScanRequest()
        .withTableName("ProductCatalog")
        .withLimit(100)
        .withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
        .withExclusiveStartKey(exclusiveStartKey);
    ScanResult result = dynamodb.scan(scan);
    exclusiveStartKey = result.getLastEvaluatedKey();
    
    // Account for the rest of the throughput we consumed, 
    // now that we know how much that scan request cost 
    double consumedCapacity = result.getConsumedCapacity().getCapacityUnits();
    permitsToConsume = (int)(consumedCapacity - 1.0);
    if(permitsToConsume <= 0) {
        permitsToConsume = 1;
    }
    
    // Process results here
    processYourResults(result);
    
} while (exclusiveStartKey  != null);

The preceding code example limits the consumed capacity to 25.0 read capacity units per second, as determined by the following algorithm:

  1. Initialize a RateLimiter object with a target rate of 25.0 capacity units per second.
  2. Initialize a pagination token to null. We use this token for looping through each “page” of the Scan results.
  3. Acquire read capacity units from the rate limiter. The first time through, we consume “1” because we don’t know how much throughput each “page” of the scan will consume. This pauses the application until we have “recharged” enough throughput.
  4. Perform the scan, passing in the ExclusiveStartKey, and also a Limit. If unbounded, the scan will consume 128 read capacity units, which could cause an uneven workload on the table. Also pass in “TOTAL” to ReturnConsumedCapacity so that DynamoDB will return the amount of throughput consumed by the request.
  5. Record the amount of consumed throughput, so that next time around the loop, we will ask for more or fewer permits from the rate limiter.
  6. Process the results of that “page” of the scan.

The preceding algorithm shows a good basic approach to scanning a table “gently” in the background without interfering with production traffic. However, it could be improved upon. Here are a few other best practices you could build into such a background scan job:

  • Parallel scan – To distribute the workload uniformly across the partitions of the table, pass the Segment and TotalSegments parameters into the Scan operation. You can use multiple threads, processes, or machines to scale out the scan work on the client side.
  • Estimating page sizes – The code above uses a limit of “100” on every page of the scan. A more sophisticated approach could involve computing a Limit based on the throughput consumed by each page of the scan. Ideally, each page would consume a fairly small number of read capacity units so that you avoid sudden bursts of read activity.
  • Rounding to 4 KB boundaries – Every 4 KB of data scanned consumes 0.5 read capacity units (0.5 and not 1.0 because Scan uses eventually consistent reads). Therefore if you specify a Limit that results in scanning a size that isn’t divisible by 4 KB, you waste some throughput. Ideally the algorithm estimates how many items fit into a 4 KB chunk and adjusts the Limit accordingly.
  • Recording progress – If the server process were to crash, or if errors occurred beyond the automatic retries in the SDK, we want to resume where we left off next time around. Imagine a 2 hour scan job getting to be 99% done and crashing. You could build a “stateful cursor” in a DynamoDB table by saving the LastEvaluatedKey in an item after every page. Be careful, though, since that will only save how far the scan got. Your application will have to be able to deal with the possibility of processing a page multiple times.

The Scan operation in DynamoDB is useful and necessary for performing various occasional background operations. However, applications that perform scans should do so by following the DynamoDB best practices. Hopefully, Google Guava’s RateLimiter makes doing so a bit easier. Also, you might want to check out our earlier blog post on using Google Guava’s Map builder API for writing shorter code when working with maps in the AWS SDK for Java.

Connecting to Amazon EC2 Instances from the AWS Toolkit for Visual Studio

by Norm Johanson | on | in .NET | Permalink | Comments |  Share
I think the feature I use the most in the
AWS Toolkit for Visual Studio is being able to quickly connect to my EC2 instances with Remote Desktop. I use the toolkit to store my private keys encrypted so when I want to remote desktop, all I have to do is confirm I want to use the key pair and click
OK.
 
For example, let’s say I want to remote desktop into my instance. I would navigate to the
EC2 Instances view, right-click my instance, and click
Open Remote Desktop.
 
 
In this case, I have not yet used the private key with the toolkit, so it prompts me to enter my private key. I’m going to leave the
Save Private Key field checked to make it easier to remote desktop in the future, and then I click
OK.
 
 
Now the toolkit starts the Remote Desktop session, and in a few seconds I will be in my Windows instance ready to go. The next time I attempt to connect to this instance or any instance that uses the same key pair, I’ll get a connect box like the one below.
 
 
Since the toolkit has the private key stored in it, I’m no longer prompted to enter my private key. I just need to confirm I want to use this specific key pair to log on, and the toolkit takes care of the rest.
 
Not only can I remote desktop from the
EC2 Instances view, but I can also remote desktop directly from the AWS Elastic Beanstalk
Environment view and the AWS CloudFormation
Stack view by clicking the
Connect to Instance button in the toolbar. If multiple EC2 instances are associated with the environment or stack, I will first be prompted for the specific EC2 instance I want to connect to.
 
I can also use the toolkit to connect to my Linux instances just as easily. In this case, the toolkit will use the Windows SSH client Putty, which I needed to install before attempting to SSH. The toolkit gives me all the same advantages of storing my private key and takes care of converting the private key from its native PEM format to Putty’s PPK format.
 

AWS at Symfony Live Portland 2013

by Jeremy Lindblom | on | in PHP | Permalink | Comments |  Share

A few weeks ago, I had the pleasure of attending the Symfony Live Portland 2013 conference. This year, Symfony Live co-located with the very large DrupalCon, and though I did not attend any of the DrupalCon sessions, I did get to talk to many Drupal developers during lunches and the hack day. It was awesome to be among so many other PHP developers.

I had the honor of being selected as a speaker at Symfony Live, and the topic of my session was Getting Good with the AWS SDK for PHP (here are the slides and Joind.in event). In this talk I did a brief introduction about AWS and its services, taught how to use the AWS SDK for PHP, and demonstrated some code from a sample PHP application that uses Amazon S3 and Amazon DynamoDB to manage its data.

How does the SDK integrate with Symfony?

Since I was in the presence of Symfony developers, I made sure to point out some of the ways that the AWS SDK for PHP currently integrates with the Symfony framework and community.

The SDK uses the Symfony Event Dispatcher

The SDK uses the Symfony Event Dispatcher component quite heavily. Not only are many of the internal details of the SDK implemented with events (e.g., request signing), but users of the SDK can listen for events and inject their own logic into the request flow.

For example, the following code attaches an event listener to an SQS client that will capitalize messages sent to a queue via the SendMessage operation.

use AwsCommonAws;
use GuzzleCommonEvent; // Extends SymfonyComponentEventDispatcherEvent

$aws = Aws::factory('/path/to/your/config.php');
$sqs = $aws->get('sqs');

$dispatcher = $sqs->getEventDispatcher();
$dispatcher->addListener('command.before_send', function (Event $event) {
    $command = $event['command'];
    if ($command->getName() === 'SendMessage') {
        // Ensure the message is capitalized
        $command['MessageBody'] = ucfirst($command['MessageBody']);
    }
});

$sqs->sendMessage(array(
    'QueueUrl'    => $queueUrl,
    'MessageBody' => 'an awesome message.',
));

We publish an AWS Service Provider for Silex

For Silex users, we publish an AWS Service Provider for Silex that makes it easier to bootstrap the AWS SDK for PHP within a Silex application. I used this service provider in my presentation with the sample PHP application, so make sure to check out my slides.

You can use the Symfony Finder with Amazon S3

In my presentation, I also pointed out our recent addition of the S3 Stream Wrapper to our SDK and how you can use it in tandem with the Symfony Finder component to find files within your Amazon S3 buckets.

The following example shows how you can use the Symfony Finder to find S3 objects in the bucket "jcl-files", with a key prefix of "family-videos", that are smaller than 50 MB in size and no more than a year old.

use AwsCommonAws;
use SymfonyComponentFinderFinder;

$aws = Aws::factory('/path/to/your/config.php');
$aws->get('s3')->registerStreamWrapper();

$finder = new Finder();
$finder->files()
    ->in('s3://jcl-files/family-videos')
    ->size('< 50M')
    ->date('since 1 year ago');

foreach ($finder as $file) {
    echo $file->getFilename() . PHP_EOL;
}

Others talked about AWS

One of my co-workers, Michael Dowling, also presented at the conference. His presentation was about his open source project, Guzzle, which is a powerful HTTP client library and is used as the foundation of the AWS SDK for PHP. In his talk, Michael also highlighted a few of the ways that the AWS SDK for PHP uses Guzzle. Guzzle is also being used in the core of Drupal 8, so his presentation drew in a crowd of both Drupal and Symfony developers.

Aside from our presentations, there were various sessions focused on the Symfony framework as well as others on various topics like Composer, caching, and cryptograpy. David Zuelke and Juozas Kaziukėnas both mentioned how they use AWS services in their talks: Surviving a Prime Time TV Commercial and Process any amount of data. Any time, respectively. It was nice to meet in person many PHP developers I’ve talked with online and to participate in Symfony Live traditions such as PHP Jeopardy and karaoke.

While at the conference, I talked to several developers about what would make a good AWS Symfony bundle or Drupal module, but I’m also curious to find out what you think. So… what would you like to see in an AWS Symfony Bundle? What would make a good AWS Drupal module? Let us know your thoughts in the comments.