Category: Java


Amazon S3 TransferManager – Batched File Uploads

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

In addition to all the cool features in TransferManager around asynchronous upload and download management, there are some other great features around batched uploads and downloads of multiple files.

The uploadDirectory and uploadFileList methods in TransferManager make it easy to upload a complete directory, or a list of specific files to Amazon S3, as one background, asynchronous task.

In some cases though, you might want more control over how that data is uploaded, particularly around additional metadata you want to provide for the data you’re uploading. A second form of uploadFileList allows you to pass in an implementation of an ObjectMetadataProvider interface that will let you do just that. For each of the files being uploaded, this ObjectMetadataProvider will receive a callback via the provideObjectMetadata method, allowing it to fill in any additional metadata you’d like to store alongside your object data in Amazon S3.

The following code demonstrates how easy it is to use the ObjectMetadataProvider interface to pass along additional metadata to your uploaded files.

TransferManager tm = new TransferManager(myCredentials);

ObjectMetadataProvider metadataProvider = new ObjectMetadataProvider() {
    void provideObjectMetadata(File file, ObjectMetadata metadata) {
        // If this file is a JPEG, then parse some additional info
        // from the EXIF metadata to store in the object metadata
        if (isJPEG(file)) {
            metadata.addUserMetadata("original-image-date", 
                                     parseExifImageDate(file));
        }
    }
}

MultipleFileUpload upload = tm.uploadFileList(
        myBucket, myKeyPrefix, rootDirectory, fileList, metadataProvider);

Saving Money with Amazon EC2 Reserved Instances

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

Are you or your company using Amazon EC2 instances? Are you using Amazon EC2 Reserved Instances yet? Reserved Instances are often one of the easiest and most effective ways to save money on your Amazon EC2 bill. They can allow you to significantly reduce the price you pay for Amazon EC2 instance hours over a one or three year period compared to on-demand rates. Many customers using Amazon EC2 Reserved Instances are saving lots of money, and we’d love to see more customers using them!

Reserved Instances got even more attractive recently, with an API update that allows you to modify the details of your Reserved Instances. Until this release, the Availability Zone you specified at the time of purchase remained fixed for the duration of the term of the Reserved Instances. This release gives you the ability to migrate your Reserved Instances to a different Availability Zone within the same region, making Reserved Instances even more flexible.

You can modify your Reserved Instances through the AWS Management Console, or you can use one of the AWS SDKs to pragmatically modify them:

AmazonEC2Client ec2 = new AmazonEC2Client(...);
ReservedInstancesConfiguration configuration = new ReservedInstancesConfiguration()
     .withPlatform("EC2-VPC")
     .withAvailabilityZone("us-east-1b")
     .withInstanceCount(1);              
ModifyReservedInstanceRequest request = new ModifyReservedInstancesRequest()
     .withReservedInstancesIds(myReservedInstancesId);
     .withTargetConfigurations(configuration);
ec2.modifyReservedInstances(request);

Snippet: Creating Amazon DynamoDB Tables

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

In many applications, it’s important to make sure your code handles creating any resources that it needs in order to run. Otherwise, you’ll have to manually create those resources whenever you want to run your application with a new AWS account.

For example, if you have an application that needs to store data in an Amazon DynamoDB table, then you’ll probably want your application to check if that table exists at startup, create it if necessary, and only let your application logic start running once that table is ready to use.

The following code demonstrates how to create a simple Amazon DynamoDB table using the SDK:

AmazonDynamoDB dynamo = new AmazonDynamoDBClient(myCredentials);

CreateTableRequest request = new CreateTableRequest().withTableName("customers");

request.withKeySchema(new KeySchemaElement()
        .withAttributeName("customerId")
        .withKeyType(KeyType.HASH));

request.withAttributeDefinitions(new AttributeDefinition()
        .withAttributeName("customerId")
        .withAttributeType(ScalarAttributeType.S));

request.setProvisionedThroughput(new ProvisionedThroughput()
        .withReadCapacityUnits(5)
        .withWriteCapacityUnits(2));

dynamo.createTable(request);

This code creates a simple table called customers, specifies low values for provisioned throughput, and declares the hash key (think: primary key) to be an attribute named id with type String.

Once you’ve created your table, you’ll want to make sure it’s ready for use before you let your application logic start executing; otherwise, you’ll get errors from Amazon DynamoDB when you try to use it.

The following function, taken from some of our SDK test code for DynamoDB, demonstrates how to poll the status of a table and detect when the table is ready for use.

protected static void waitForTableToBecomeAvailable(String tableName) throws InterruptedException {
    System.out.println("Waiting for " + tableName + " to become ACTIVE...");

    long startTime = System.currentTimeMillis();
    long endTime = startTime + (10 * 60 * 1000);
    while ( System.currentTimeMillis() < endTime ) {
        Thread.sleep(1000 * 20);
        try {
            DescribeTableRequest request = new DescribeTableRequest()
                 .withTableName(tableName);
            TableDescription table = dynamo.describeTable(request).getTable();
            if ( table == null ) continue;

            String tableStatus = table.getTableStatus();
            System.out.println("  - current state: " + tableStatus);
            if ( tableStatus.equals(TableStatus.ACTIVE.toString()) )
                return;
        } catch ( AmazonServiceException ase ) {
            if (!ase.getErrorCode().equalsIgnoreCase("ResourceNotFoundException"))
                throw ase;
        }
    }

    throw new RuntimeException("Table " + tableName + " never went active");
}

You can use this same logic to wait for your new table to become active. Then it’s ready for your data!

How are you managing your AWS resources? Do your applications automatically create all the AWS resources they need? Are you using AWS CloudFormation to handle resource creation?

Amazon DynamoDB Session Manager for Apache Tomcat

Today we’re excited to talk about a brand new open source project on our GitHub page for managing Apache Tomcat sessions in Amazon DynamoDB!

DynamoDB’s fast and predictable performance characteristics make it a great match for handling session data. Plus, since it’s a fully-managed NoSQL database service, you avoid all the work of maintaining and operating a separate session store.

Using the DynamoDB Session Manager for Tomcat is easy. Just drop the library in the lib directory of your Tomcat installation and tell Tomcat you’re using a custom session manager in your context.xml configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<Context>
    <WatchedResource>WEB-INF/web.xml</WatchedResource>
    <Manager className="com.amazonaws.services.dynamodb.sessionmanager.DynamoDBSessionManager"
             awsAccessKey="myAccessKey"
             awsSecretKey="mySecretKey"
             createIfNotExist="true" />
</Context>

The context.xml file above configures the session manager to store your sessions in DynamoDB, and uses the provided AWS security credentials to access DynamoDB. There are several other configuration options available, including many ways to provide your security credentials:

  • you can explicitly specify them (as shown above)
  • you can specify a properties file to load them from
  • you can rely on the DefaultAWSCredentialsProviderChain to load your credentials from environment variables, Java system properties, or IAM roles for Amazon EC2 instances

If you’re using the AWS Toolkit for Eclipse and deploying your application through AWS Elastic Beanstalk, then all you have to do is opt-in to using the DynamoDB Session Manager for Tomcat in the New AWS Java Web Project Wizard. Then when you deploy to AWS Elastic Beanstalk, all your sessions will be managed in DynamoDB.

For more details on using the session manager, check out the Session Manager section in the AWS SDK for Java Developer Guide. Or, if you really want to get into the details, check out the project on GitHub.

We’re excited to have the first version of the Amazon DynamoDB Session Manager for Apache Tomcat out there for customers to play with. What features do you want to see next? Let us know in the comments below!

Quick Tips: Managing Amazon S3 Data in Eclipse

No matter what type of application you’re developing, it’s a safe bet that it probably needs to save or load data from a central data store, such as Amazon S3. During development, you can take advantage of the Amazon S3 management tools provided by the AWS Toolkit for Eclipse, all without ever leaving your IDE.

To start, find your Amazon S3 buckets in the AWS Explorer view.

From the AWS Explorer view, you can create and delete buckets, or double-click on one of your buckets to open it in the Bucket Editor.

Once you’re in the Bucket Editor, you can delete objects in your bucket, edit the permissions for objects or the bucket itself, and generate pre-signed URLs that you can safely pass around to give other people access to the data stored in your account without ever having to give away your AWS security credentials.


One of the most useful features is the ability to drag and drop files into your Amazon S3 buckets directly from your OS. In the following screenshot, I’ve selected a file from the Mac Finder and drag-and-dropped it into a virtual folder in the object listing in the Bucket Editor. To download one of your objects from Amazon S3, just drag it to a directory in a view such as Eclipse’s Package Explorer.

The AWS Toolkit for Eclipse has many features that facilitate development and deployment of AWS applications. For more information, check out some of our other Eclipse blog posts:

The DynamoDBMapper, Local Secondary Indexes, and You!

Earlier this year, Amazon DynamoDB released support for local secondary indexes. At that time, the AWS SDK for Java added support for LSIs, for both the low-level(AmazonDynamoDBClient) and high-level(DynamoDBMapper) APIs in the com.amazonaws.services.dynamodbv2 package. Since then, I have seen a few questions on how to use the DynamoDBMapper with local secondary indexes. In this post, I will build on the Music Collection sample that is included in the Amazon DynamoDB documentation.

The example table uses a String hash key (Artist), a String range key (SongTitle), and a local secondary index on the AlbumTitle attribute (also a String). I created the table used in this example with the DynamoDB support that is part of the AWS Toolkit for Eclipse, but you could use the code included in the documentation or the AWS Management Console. I also used the Eclipse Toolkit to populate the table with some sample data. Next, I created a POJO to represent an item in the MusicCollection table. The code for MusicCollectionItem is shown below.

@DynamoDBTable(tableName="MusicCollection")
public class MusicCollectionItem {

    private String artist;
    private String songTitle;
    private String albumTitle;
    private String genre;
    private String year;

    @DynamoDBHashKey(attributeName="Artist")
    public String getArtist() { return artist; }
    public void setArtist(String artist) { this.artist = artist; }

    @DynamoDBRangeKey(attributeName = "SongTitle")
    public String getSongTitle() { return songTitle; }
    public void setSongTitle(String songTitle) { this.songTitle = songTitle; }

    @DynamoDBIndexRangeKey(attributeName="AlbumTitle", 
                           localSecondaryIndexName="AlbumTitleIndex")
    public String getAlbumTitle() { return albumTitle; }
    public void setAlbumTitle(String albumTitle) { this.albumTitle = albumTitle; }

    @DynamoDBAttribute(attributeName="Genre")
    public String getGenre() { return genre; }
    public void setGenre(String genre) { this.genre = genre; }

    @DynamoDBAttribute(attributeName="Year")
    public String getYear() { return year;}
    public void setYear(String year) { this.year = year; }
}

As you can see, MusicCollectionItem has the hash key and range key annotations, but also a new annotation DynamoDBIndexRangeKey. You can find the documentation for that annotation here. The DynamoDBIndexRangeKey marks the property as an alternate range key to be used in a local secondary index. Since Amazon DynamoDB can support up to five local secondary indexes, I can also have up to five attributes annotated with the DynamoDBIndexRangeKey. Also note in the code above, since the documentation sample uses PascalCase, I needed to include the attributeName='X' in each of the annotations. If you were starting from scratch, you could make this code simpler by using attribute names that match your instance variable names.

So now that you have both a table and a corresponding POJO using a local secondary index, how do you use it with the DynamoDBMapper? Using a local secondary index with the mapper is pretty straightforward. You create the mapper the same way as before:

dynamoDB = Region.getRegion(Regions.US_WEST_2)
           .createClient(AmazonDynamoDBClient.class, new ClasspathPropertiesFileCredentialsProvider(), null);
mapper = new DynamoDBMapper(dynamoDB);;

Next, you can query the range key in the same manner as you would a table without a local secondary index:

String artist = "The Okee Dokee Brothers";
MusicCollectionItem musicKey = new MusicCollectionItem();
musicKey.setArtist(artist);
DynamoDBQueryExpression<MusicCollectionItem> queryExpression = new DynamoDBQueryExpression<MusicCollectionItem>()
      .withHashKeyValues(musicKey);
List<MusicCollectionItem> myCollection = mapper.query(MusicCollectionItem.class, queryExpression);

This code looks up my kids new favorite artist and returns all the song titles that are in my Amazon DynamoDB table. I could add a Condition that would limit the song titles, but I wanted to get list of all of them.

But what if I want to know which songs are on The Okee Dokee Brothers latest album—Can you Canoe? Well luckily, I have a local secondary index on the AlbumTitle attribute. Before local secondary indexes, I could only do a Scan operation, which would have scanned the entire table, but with local secondary indexes I can easily do a Query operation. The code for using the index is:

rangeKeyCondition = new Condition();
rangeKeyCondition.withComparisonOperator(ComparisonOperator.EQ)
     .withAttributeValueList(new AttributeValue().withS("Can You Canoe?"));
queryExpression = new DynamoDBQueryExpression<MusicCollectionItem>()
     .withHashKeyValues(musicKey)
     .withRangeKeyCondition("AlbumTitle", rangeKeyCondition);
myCollection = mapper.query(MusicCollectionItem.class, queryExpression);

As you can see, doing a query on a local secondary index with the DynamoDBMapper is exactly the same as doing a range key query.

Now that I have shown how easy it is to use a local secondary index with the DynamoDBMapper, how will you use them? Let us know in the comments!

Closeable S3Objects

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

The com.amazonaws.services.s3.model.S3Object class now implements the Closeable interface (AWS SDK for Java 1.4.8 onwards). This allows you to use it as a resource in a try-with-resources statement. S3Object contains an S3ObjectInputStream that lets you stream down your data over the HTTP connection from Amazon S3. Since the HTTP connection is open and waiting, it’s important to read the stream quickly after calling getObject and to remember to close the stream so that the HTTP connection can be released properly. With the new Closeable interface, it’s even easier to ensure that you’re properly handling those HTTP connection resources.

The following snippet demonstrates how simple it is to use S3Object with a try-with-resources statement.

try (S3Object object = s3.getObject(bucket, key)) {
    System.out.println("key: " + object.getKey());
    System.out.println("data: " + dumpStream(object.getObjectContent());
} catch (Exception e) {
    System.out.println("Unable to download object from Amazon S3: " + e);
}

Injecting Failures and Latency using the AWS SDK for Java

by Wade Matveyenko | on | in Java | Permalink | Comments |  Share

Today we have another guest post from a member of the Amazon DynamoDB team, Pejus Das.


The Amazon DynamoDB service provides fast and predictable performance with seamless scalability. It also has a list of common errors that can occur during request processing. You probably have a set of test suites that you run before you release changes to your application. If you’re using DynamoDB in your application, your tests probably call it using an isolated test account, or one of the sets of mock DynamoDB facades out there. This link lists some sample open source libraries, Object-Relational Mappers, and Mock implementations. Or maybe you have a combination of both solutions, with mocking for unit tests and using a test account for integration tests. Either way, your test suite likely covers expected successful scenarios, and expected failure scenarios.

But then there are the other classes of failures that are harder to test for. Amazon DynamoDB is a remote dependency that you call across the network (or possibly even over the internet). A whole class of things can go wrong with this kind of an interaction, and when things do go wrong, your application will behave a lot better if you’ve tested those failure scenarios in advance.

There are many approaches to injecting unexpected failures in your application. For example, you can simulate what happens to your application when DynamoDB returns one of the documented errors returned by DynamoDB. You can also test the impact of high request latencies on your application. Such testing helps to build reliable and robust client applications that gracefully handle service errors and request delays.  In this blog post, we describe another approach: how you can easily inject these kinds of failures into the client application using the AWS SDK for Java.

Request Handlers

The AWS SDK for Java allows you to register request handlers with the DynamoDB Java client. You can attach multiple handlers and they are executed in the order you added them to the client. The RequestHandler interface gives you three hooks into the request execution cycle: beforeRequest, afterRequest, and afterError.

Hook Description
beforeRequest Called just before executing the HTTP request against an AWS service like DynamoDB
afterRequest Called just after the Response is received and processed by the Client
afterError Called if there are any AmazonClientException errors while executing the HTTP request

The RequestHandler hooks give an easy way to inject failures and latencies in the client for testing.

Injecting Failures

The beforeRequest hook provides access to the Request object. You can inspect the Request and take some action based either on the Request or on some other condition. In the following example, we inspect a PutRequest and inject a ProvisionedThroughputExceededException on an average 50 percent of the time.

@Override
public void beforeRequest(Request<?> request) {
    // Things to do just before a request is executed 
    if (request.getOriginalRequest() instanceof PutItemRequest) {
        // Throw throuhgput exceeded exception for 50% of put requests 
        if (rnd.nextInt(2) == 0) {
           logger.info("Injecting ProvisionedThroughputExceededException");
           throw new ProvisionedThroughputExceededException("Injected Error");
        }
    }
    // Add latency to some Get requests 
    if (request.getOriginalRequest() instanceof GetItemRequest) {
        // Delay 50% of GetItem requests by 500 ms 
        if (rnd.nextInt(2) == 0) {
            // Delay on average 50% of the requests from client perspective 
            try {
                logger.info("Injecting 500 ms delay");
                Thread.sleep(500);
            } catch (InterruptedException ie) {
                logger.info(ie);
                throw new RuntimeException(ie);
            }
        }
    }
}

Injecting Latency

You could simply put a sleep in the beforeRequest hook to simulate latencies. If you want to inspect the Response object and inject latencies for specific traffic you would use the afterResponse hook. You can analyze the response data from DynamoDB and act accordingly. In the following example, we inspect for a GetItemRequest and when the item is an Airplane, we modify the item and additionally add a 500 ms delay.

@Override
public void afterResponse(Request<?> request, Object resultObject, TimingInfo timingInfo) {
    // The following is a hit and miss for multi-threaded
    // clients as the cache size is only 50 entries
    String awsRequestId = dynamoDBClient.getCachedResponseMetadata(
                          request.getOriginalRequest()).getRequestId();
    logger.info("AWS RequestID: " + awsRequestId);
    // Here you could inspect and alter the response object to
    // see how your application behaves for specific data
    if (request.getOriginalRequest() instanceof GetItemRequest) {
        GetItemResult result = (GetItemResult) resultObject;
        Map item = result.getItem();
        if (item.get("name").getS().equals("Airplane")) {
            // Alter the item
            item.put("name", new AttributeValue("newAirplane"));
            item.put("new attr", new AttributeValue("new attr"));
            // Add some delay
            try {
                Thread.sleep(500);
            } catch (InterruptedException ie) { 
                logger.info(ie);
                throw new RuntimeException(ie);
            }
        }
    }
}

The preceding code examples are listed on GitHub in the aws-labs repository here.

While this approach simulates increased latency and failures, it is only a simulation based on what you think will be happening during a failure. If you want to test your application for failures even more thoroughly, take a look at the Chaos Monkey and Simian Army applications written by Netflix. These inject actual failures into your system, revealing the interactions between more components in your system than just your application logic. We hope that adding fault injection testing to your application helps you be prepared for failure. Let us know in the comments!

Reference Links

  1. http://aws.typepad.com/aws/2012/04/amazon-dynamodb-libraries-mappers-and-mock-implementations-galore.html
  2. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ErrorHandling.html
  3. https://github.com/awslabs
  4. https://github.com/awslabs/aws-dynamodb-examples/tree/master/inject-errors-latencies
  5. http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
  6. http://techblog.netflix.com/2011/07/netflix-simian-army.html

AWS SDKs and Tools @ OSCON

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

A few of us from the SDKs and Tools teams will be down in Portland for OSCON next week.

We’ll be at the AWS booth talking to customers, answering questions, and as always, looking for talented engineers, managers, and designers interested in building the future of the AWS platform. If you aren’t able to drop by at OSCON, you can always browse our open positions and apply online.

If you’ll be at the conference, please come by and say hello! We love hearing from our customers, whether it’s feature requests for our tools and services, or if you just want to talk about ideas for new ways to use AWS.

We hope we’ll see some of you in Portland!


OSCON 2013

Release: AWS SDK for Java 1.5.0

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

We released version 1.5.0 of the AWS SDK for Java last night.  This release contains several exciting enhancements including:

  • Upgrading to the latest major version of Apache HttpClient
  • Support for the Closeable interface on Amazon S3 objects
  • Easier construction of requests that use map datatypes
  • Batching improvements for Amazon DynamoDB
  • Support for Amazon Elastic Transcoder’s latest API version, enabling support for frame rate options and watermarks. 

We’ll be digging into some of these features more in upcoming blog posts.

For more details, see the full release notes.