Category: Java


Rate-Limited Scans in Amazon DynamoDB

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

Today we’re lucky to have another guest post by David Yanacek from the Amazon DynamoDB team. David is sharing his deep knowledge on Amazon DynamoDB to help explain how to manage performance and throughput usage on your DynamoDB tables.


When you scan your table in Amazon DynamoDB, you should follow the DynamoDB best practices for avoiding sudden bursts of read activity. You may also want to limit a background Scan job to use a limited amount of your table’s provisioned throughput, so that it doesn’t interfere with your more important operations. Fortunately, the Google Guava libraries for Java include a RateLimiter class, which makes it easy to limit the amount of provisioned throughput you use.

Let’s say that you have an application that scans a DynamoDB table once a day in order to produce reports, take a backup, compute aggregates, or do something else that involves scanning the whole table. It’s worth pointing out that Amazon DynamoDB is also integrated with Amazon Elastic Map Reduce, and with Amazon Redshift. These integrations let you export your tables to other locations like Amazon S3, or to perform complex analytics and queries that DynamoDB does not natively support. However, it’s also common to do this sort of scan activity in the application instead of using EMR or Redshift, so let’s go into the best practices for doing this scan without interfering with the rest of the application.

To illustrate, let’s say that you have a table that is 50 GB in size and is provisioned with 10,000 read capacity units per second. Assume that you will perform this scan at night when normal traffic to your table consumes only 5,000 read capacity units per second. This gives you plenty of extra provisioned throughput for scanning your table, but you still don’t want it to interfere with your normal workload. If you allow your scan to consume 2,000 read capacity units, it will take about an hour to complete the scan, according to following calculation:

50 GB requires 6,553,600 read capacity units to scan, which is 50 (GB) * 1024 (MB / GB) * 1024 (KB / MB) / 2 (Scan performs eventually consistent reads, which are half the cost.) / 4 (Each 4 KB of data consumes 1 read capacity unit.) . 2,000 read capacity units per second yields 7,200,000 per hour. So 6,553,600 / 7,200,000 is equal to 0.91 hours, or about 55 minutes.

To make the most of your table’s provisioned throughput, you’ll want to use the Parallel Scan API operation so that your scan is distributed across your table’s partitions. But be careful that your scan doesn’t consume your table’s provisioned throughput and cause the critical parts of your application to be throttled. To avoid throttling, you need to rate limit your client application—something Guava’s RateLimiter class makes easy:

// Initialize the rate limiter to allow 25 read capacity units / sec
RateLimiter rateLimiter = RateLimiter.create(25.0);

// Track how much throughput we consume on each page 
int permitsToConsume = 1;

// Initialize the pagination token 
Map<String, AttributeValue> exclusiveStartKey = null;

do {
    // Let the rate limiter wait until our desired throughput "recharges"
    rateLimiter.acquire(permitsToConsume);
    
    // Do the scan
    ScanRequest scan = new ScanRequest()
        .withTableName("ProductCatalog")
        .withLimit(100)
        .withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
        .withExclusiveStartKey(exclusiveStartKey);
    ScanResult result = dynamodb.scan(scan);
    exclusiveStartKey = result.getLastEvaluatedKey();
    
    // Account for the rest of the throughput we consumed, 
    // now that we know how much that scan request cost 
    double consumedCapacity = result.getConsumedCapacity().getCapacityUnits();
    permitsToConsume = (int)(consumedCapacity - 1.0);
    if(permitsToConsume <= 0) {
        permitsToConsume = 1;
    }
    
    // Process results here
    processYourResults(result);
    
} while (exclusiveStartKey  != null);

The preceding code example limits the consumed capacity to 25.0 read capacity units per second, as determined by the following algorithm:

  1. Initialize a RateLimiter object with a target rate of 25.0 capacity units per second.
  2. Initialize a pagination token to null. We use this token for looping through each “page” of the Scan results.
  3. Acquire read capacity units from the rate limiter. The first time through, we consume “1” because we don’t know how much throughput each “page” of the scan will consume. This pauses the application until we have “recharged” enough throughput.
  4. Perform the scan, passing in the ExclusiveStartKey, and also a Limit. If unbounded, the scan will consume 128 read capacity units, which could cause an uneven workload on the table. Also pass in “TOTAL” to ReturnConsumedCapacity so that DynamoDB will return the amount of throughput consumed by the request.
  5. Record the amount of consumed throughput, so that next time around the loop, we will ask for more or fewer permits from the rate limiter.
  6. Process the results of that “page” of the scan.

The preceding algorithm shows a good basic approach to scanning a table “gently” in the background without interfering with production traffic. However, it could be improved upon. Here are a few other best practices you could build into such a background scan job:

  • Parallel scan – To distribute the workload uniformly across the partitions of the table, pass the Segment and TotalSegments parameters into the Scan operation. You can use multiple threads, processes, or machines to scale out the scan work on the client side.
  • Estimating page sizes – The code above uses a limit of “100” on every page of the scan. A more sophisticated approach could involve computing a Limit based on the throughput consumed by each page of the scan. Ideally, each page would consume a fairly small number of read capacity units so that you avoid sudden bursts of read activity.
  • Rounding to 4 KB boundaries – Every 4 KB of data scanned consumes 0.5 read capacity units (0.5 and not 1.0 because Scan uses eventually consistent reads). Therefore if you specify a Limit that results in scanning a size that isn’t divisible by 4 KB, you waste some throughput. Ideally the algorithm estimates how many items fit into a 4 KB chunk and adjusts the Limit accordingly.
  • Recording progress – If the server process were to crash, or if errors occurred beyond the automatic retries in the SDK, we want to resume where we left off next time around. Imagine a 2 hour scan job getting to be 99% done and crashing. You could build a “stateful cursor” in a DynamoDB table by saving the LastEvaluatedKey in an item after every page. Be careful, though, since that will only save how far the scan got. Your application will have to be able to deal with the possibility of processing a page multiple times.

The Scan operation in DynamoDB is useful and necessary for performing various occasional background operations. However, applications that perform scans should do so by following the DynamoDB best practices. Hopefully, Google Guava’s RateLimiter makes doing so a bit easier. Also, you might want to check out our earlier blog post on using Google Guava’s Map builder API for writing shorter code when working with maps in the AWS SDK for Java.

Writing less code when using the AWS SDK for Java

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

Today we have a guest post by David Yanacek from the Amazon DynamoDB team.


The AWS SDK for Java provides a convenient set of methods for building request objects. This set of methods, known as a fluent interface, can save you from repeatedly retyping the request variable name, and can even make your code more readable. But what about maps? Services like Amazon DynamoDB use java.util.Map objects throughout their API, which do not lend themselves naturally to this builder pattern. Fortunately, the Google Guava open source library offers some classes that make it possible to build maps in a way that is compatible with the SDK’s fluent interface. In this post, we show how using Google Guava’s collection classes can make it easier to use services like Amazon DynamoDB with the low-level Java SDK. 

First, let’s look at some code that uses the bean interface—not the fluent interface—for making a PutItem call to DynamoDB. This example puts an item into the “ProductCatalog” described in the Amazon DynamoDB Developer Guide, using a conditional write so that DynamoDB makes the change only if the item already exists and has the price of “26.00”. The example also asks DynamoDB to return the previous copy of the item.

// Construct the new item to put
Map<String, AttributeValue> item = new HashMap<String, AttributeValue>();

AttributeValue id = new AttributeValue();
id.setN("104");
item.put("Id", id);

AttributeValue title = new AttributeValue("Book 104 Title");
item.put("Title", title);

AttributeValue isbn = new AttributeValue("111-1111111111");
item.put("ISBN", isbn);

AttributeValue price = new AttributeValue();
price.setN("25");
item.put("Price", price);

List<String> authorList = new ArrayList<String>();
authorList.add("Bob");
authorList.add("Alice");
AttributeValue authors = new AttributeValue();
authors.setSS(authorList);
item.put("Authors", authors);

// Construct a map of expected current values for the conditional write
Map<String, ExpectedAttributeValue> expected = new HashMap<String, ExpectedAttributeValue>();

ExpectedAttributeValue expectedPrice = new ExpectedAttributeValue();
AttributeValue currentPrice = new AttributeValue();
currentPrice.setN("26");
expectedPrice.setValue(currentPrice);
expected.put("Price", expectedPrice);

// Construct the request
PutItemRequest putItemRequest = new PutItemRequest();
putItemRequest.setTableName("ProductCatalog");
putItemRequest.setItem(item);
putItemRequest.setExpected(expected);
putItemRequest.setReturnValues(ReturnValue.ALL_OLD);

// Make the request
PutItemResult result = dynamodb.putItem(putItemRequest);

That’s a lot of code for doing something as simple as putting an item into a DynamoDB table. Let’s take that same example and switch it over to using the built-in fluent style interface:

// Construct the new item to put
Map<String, AttributeValue> item = new HashMap<String, AttributeValue>();
item.put("Id", new AttributeValue().withN("104"));
item.put("Title", new AttributeValue("Book 104 Title"));
item.put("ISBN", new AttributeValue("111-1111111111"));
item.put("Price", new AttributeValue().withN("25"));
item.put("Authors", new AttributeValue()
    .withSS(Arrays.asList("Author1", "Author2")));

// Construct a map of expected current values for the conditional write
Map<String, ExpectedAttributeValue> expected = new HashMap<String, ExpectedAttributeValue>();
expected.put("Price", new ExpectedAttributeValue()
    .withValue(new AttributeValue().withN("26")));

// Make the request 
PutItemResult result = dynamodb.putItem(new PutItemRequest()
    .withTableName("ProductCatalog")
    .withItem(item)
    .withExpected(expected)
    .withReturnValues(ReturnValue.ALL_OLD));

That’s a lot shorter. You may have noticed that this code also used a method Arrays.asList(), which ships with the JDK, for constructing the authors list. Wouldn’t it be nice if the JDK came with something like that for building maps? Fortunately, Google Guava exposes several Map subclasses, and provides simple Builder utilities for each. Let’s use ImmutableMap.Builder to make the code even more compact:

// Make the request
PutItemResult result = dynamodb.putItem(new PutItemRequest()
    .withTableName("ProductCatalog")
    .withItem(new ImmutableMap.Builder<String, AttributeValue>()
        .put("Id", new AttributeValue().withN("104"))
        .put("Title", new AttributeValue("Book 104 Title"))
        .put("ISBN", new AttributeValue("111-1111111111"))
        .put("Price", new AttributeValue().withN("25"))
        .put("Authors", new AttributeValue()
            .withSS(Arrays.asList("Author1", "Author2")))
        .build())
    .withExpected(new ImmutableMap.Builder<String, ExpectedAttributeValue>()
        .put("Price", new ExpectedAttributeValue()
            .withValue(new AttributeValue().withN("26")))
        .build())
    .withReturnValues(ReturnValue.ALL_OLD));

And that’s it! We hope this approach saves you some typing and makes your code more readable. And if you want even less code, take a look at the DynamoDBMapper class, which allows you to interact with DynamoDB with your own objects directly. For more details, see the earlier blog posts Storing Java objects in Amazon DynamoDB tables and Using Custom Marshallers to Store Complex Objects in Amazon DynamoDB, or the topic Using the Object Persistence Model with Amazon DynamoDB in the Amazon DynamoDB Developer Guide.

Eclipse Deployment: Part 3 – Configuring AWS Elastic Beanstalk

Now that you know the basics about creating AWS Java web applications and deploying them using the AWS Toolkit for Eclipse, let’s talk about some of the ways you can control how your environment runs.

AWS Elastic Beanstalk provides several easy ways to configure different features of your environment. The first mechanism we’ll look at for controlling how your environment runs is your environment’s configuration. These are properties set through the Elastic Beanstalk API that let you control different operational parameters of your environment, such as load balancer behavior and auto scaling strategies. The second mechanism we’ll look at is Elastic Beanstalk extension config files that are included as files in your deployed application. These configuration files allow you to customize additional software installed on your EC2 instances, as well as create and configure AWS resources that your application requires.

We’ll start off by covering some of the most common options, which are presented in the second page of the wizard when you create a new Elastic Beanstalk environment through Eclipse.

Shell Access

If you want to be able to remotely log into a shell on the EC2 instances running your application, then you’ll need to make sure you launch your environment with an Amazon EC2 key pair. The EC2 key pair can be created and managed through Eclipse or any of the other AWS tools, and allows you to securely log into any EC2 instances launched with that key pair. To connect to an instance from Eclipse, find your instance in the EC2 Instances view, right-click to bring up the context menu and select Open Shell. If Eclipse knows the private key for that instance’s key pair, then you’ll see a command prompt open up.

CNAMEs

The default URL for your application running on AWS Elastic Beanstalk probably isn’t something that your customers will be able to easily remember. You can add another abstraction layer by creating a CNAME record that points to your application’s URL. You can set up that CNAME record with Amazon Route 53 (Amazon’s DNS web service), or with any other DNS provider. This allows you to host your application under any domain you own. You can find more details on CNAMEs in the Elastic Beanstalk Developer Guide. This CNAME not only gives your application a more friendly URL, but it also provides an important abstraction that allows you to deploy new versions of your application with zero downtime by launching a new environment with your new application version and flipping the CNAME record over to the new environment’s URL after you’ve confirmed it’s ready for production traffic. You can read more about this technique in the Elastic Beanstalk Developer’s Guide.

Notifications

AWS Elastic Beanstalk uses the Amazon Simple Notification Service (Amazon SNS) to notify you of important events affecting your application, such as environment status changes. To enable Amazon SNS notifications, simply enter your email address in the Email Address text box under Notifications on the Configuration tab inside the Toolkit for Eclipse.

SSL Certificate

If your application deals with sensitive customer information, then you’ll probably want to configure an SSL cert for your load balancer so that all data between your customers and your environment’s load balancer is encrypted. To do this, you’ll need a certificate from an external certificate authority such as VeriSign or Entrust. Once you register the the certificate with the AWS Identity and Access Management service, you can enter the certificate’s ID here to tell Elastic Beanstalk to configure your load balancer for SSL with your certificate.

Health Check URL

Your Elastic Beanstalk environment attempts to monitor the health of your application through the configured health check URL. By default Elastic Beanstalk will attempt to check the health of your application by testing a TCP connection on port 80. This is a very basic health check, and you can easily override this with your own custom health check. For example, you might create a custom health check page that will do some very basic tests of your application’s health. Be careful that you make this health check page very simply though, since this check will be run often (the interval is configurable). If you want to do more in depth health checking, you might have a separate thread in your application that reports health status such as checking for DB connection health, and then simply have your health check page report that status. If one of the hosts in your environment starts failing health checks, it will automatically be removed from your environment so that it doesn’t serve bad results to customers. The exact parameters on how these checks are run are configurable through the environment configuration editor that we’ll see shortly.

Incremental Deployment

The Incremental Deployment option (enabled by default), only affects how Eclipse uploads new application versions to Elastic Beanstalk, but it’s a neat option worth pointing out here. When you use incremental deployment, Eclipse will only push the delta of your most recent changes to AWS Elastic Beanstalk, instead of pushing every file in your whole application. Under the covers, Eclipse and Elastic Beanstalk are actually using the Git protocol to upload file deltas, and the end result is very fast application deployments for small changes after you’ve gone through a full push initially.

After you’ve started your environment, you can modify any of these configuration options, and many more, by double-clicking on your Elastic Beanstalk environment in Eclipse’s Servers view to open the Environment Configuration Editor. From here you can access dozens of settings to fine tune how your environment runs. Note that some of these options will require stopping and restarting your environment (such as changing the Amazon EC2 instance type your environment uses).

From the environment configuration editor you have access to dozens of additional options for controlling how your environment runs. The Configuration tab in the editor shows you the most common options, such as EC2 key pairs, auto scaling and load balancing parameters, and specific Java container options such as JVM settings and Java system properties.

The Advanced tab in the environment configuration editor has a complete list of every possible option for your environment, but for the vast majority of use cases, you shouldn’t need more than the Configuration tab.

Elastic Beanstalk Extension Config Files

We’ve seen how to manipulate operational settings that control how your environment runs by updating an environment’s configuration. These settings are all updated by tools working directly with the Elastic Beanstalk API to change these settings. The second way to customize your environment is through Elastic Beanstalk extension config files. These files live inside your project and get deployed with your application. They customize your environment in larger ways than the very specific settings we saw earlier.

These extension config files allow you to customize the additional software available on the EC2 instances running your application. For example, your application might want to use the Amazon CloudWatch monitoring scripts to upload custom CloudWatch metrics. You can use these extension config files to specify that the Amazon CloudWatch monitoring scripts be installed on any EC2 instance that comes up as part of your environment, then your application code will be able to access them.

You can also use these Elastic Beanstalk extension config files to create and configure AWS resources that your application will need. For example, if your application requires an Amazon SQS queue, you could declare it in your extension config file and even create an alarm on queue depth to notify you if your application gets behind on processing messages in the queue. The AWS Elastic Beanstalk Developer Guide goes into a lot more detail, and examples, demonstrating how to configure AWS resources with extension config files.

That completes our tour of the different ways you can customize your Elastic Beanstalk environments. One of the great strengths of Elastic Beanstalk is that you can simply drop in your application and not worry about customization, but if you do want to customize, you have a wealth of different ways to configure your environment to run the way you need it to for your application. What kinds of customization settings have you tried for your Elastic Beanstalk environments? Let us know the comments below!

Eclipse Deployment: Part 2 – Deploying to AWS Elastic Beanstalk

In this three part series, we’ll show how easy it is to deploy a Java web application to AWS Elastic Beanstalk using the AWS Toolkit for Eclipse.

In part one of this series, we showed how to create an AWS Java Web Project and deploy it to a local Tomcat server. This is a great workflow for developing your project, but when you’re ready for production, you’ll want to get it running on AWS. In this second post of the series, we’ll show how we can use the same tools in Eclipse to deploy our project using AWS Elastic Beanstalk.

AWS Elastic Beanstalk provides a managed application container environment for your application to run in. That means all you have to worry about is your application code. Elastic Beanstalk handles the provisioning, load balancing, auto-scaling, and application health monitoring for you. Even though Elastic Beanstalk handles all these aspects for you, you still have control over all the settings, as we’ll see in the next part of this series, if you do want to customize how your environment runs.

The AWS Toolkit for Eclipse supports deploying Java web apps to Elastic Beanstalk Tomcat containers, but Elastic Beanstalk supports many other types of applications, including:

  • .NET
  • Ruby
  • Python
  • PHP
  • Node.js

Let’s go ahead and see how easy it is to deploy our application to AWS Elastic Beanstalk. We’ll use the same workflow as before when we deployed our application to our local Tomcat server for local development and testing, but this time, we’ll select to create a new AWS Elastic Beanstalk Tomcat 7 server.

Right-click on your project and select Run As -> Run on Server, then make sure the Manually define a new server option is selected; otherwise, this wizard will only show you any existing servers you’ve configured. Select Elastic Beanstalk for Tomcat 7 from the Amazon Web Services category and move on to the next page in the wizard.

This page asks for some very basic information about the Elastic Beanstalk environment that we’re creating. Every Elastic Beanstalk environment is tied to a specific application, and of course has a name. You can choose to create a new application, or reuse an existing one. Whenever you deploy your project to this environment, you’ll be creating a new version of that application, and then deploying that new version to run in your environment.

On the next page of the wizard are some more options for configuring your new environment. We’ll go over these options and more in the next post in this series.

Go ahead and click the Finish button and Eclipse will start creating your new environment. The very first time you start your environment you’ll need to wait a few minutes while Elastic Beanstalk provisions servers for you, configures them behind a load balancer and auto-scaling group, and deploys your application. Future deployments should go much faster, but Elastic Beanstalk needs to set up several pieces of infrastructure for you the first time a new environment starts up. To see more details about what Elastic Beanstalk is doing to set up your environment, double-click on the server you just created in Eclipse’s Servers view, and open the Events tab in the server editor that opens. The event log shows you all the major events that Elastic Beanstalk is logging for your environment. If you ever have problems starting up your environment, the event log is the place to start looking for clues.

After a few minutes, you should see your application start up in Eclipse’s internal web browser, this time running from AWS instead of a local Tomcat server.

And that’s all it takes to get a Java web application deployed to AWS using AWS Elastic Beanstalk and the AWS Toolkit for Eclipse.

Now that you’ve got your environment running, try making a few small changes to your application and redeploying them, using the same tools as before. Once you get your application code set up, you’ll switch over to incremental deployments and should get very fast redeploys.

Stay tuned for the next post in this series, where we’ll explain how you can customize your environment’s configuration to control different aspects of how it runs.

Release: AWS Toolkit for Eclipse 2.3

We’ve just released a new version of the AWS Toolkit for Eclipse that adds support for managing your AWS Identity and Access Management (IAM) resources directly from within Eclipse, and updates the Amazon DynamoDB Create Table Wizard in the toolkit to support creating tables with Local Secondary Indexes.

Check out the new functionality and let us know what you think in comments below!

Eclipse Deployment: Part 1 – AWS Java Web Applications

In this three part series, we’ll show how easy it is to deploy a Java web application to AWS Elastic Beanstalk using the AWS Toolkit for Eclipse.

The first post in this series demonstrates how to create an AWS Java Web Project, and explains how that project interacts with the existing web development tools in Eclipse.

The AWS Toolkit for Eclipse builds on top of the standard Eclipse tooling for developing and deploying web applications, the Eclipse Web Tools Platform (WTP). This means you’ll be able to leverage all of the tools provided by WTP with your new AWS Java Web Project, as we’ll see later in this post.

After you’ve installed the AWS Toolkit for Eclipse, open the New AWS Java Web Project wizard.

The wizard lets you enter your project name, AWS account, and whether you want to start with a bare bones project, or a more advanced reference application. We recommend starting with the basic Java web application for your first time through. If you haven’t configured an AWS account yet, you’ll want to follow the link in the wizard to add an account. Your account information will be used to configure your project so that your application code can make requests to AWS. Once you’ve got an AWS account selected, go ahead and fill out a project name, and keep the default option to start with a basic Java web application.

After you’ve finished the wizard, you’ll have an AWS Java Web Project, ready for you to start building your application in, or to go ahead and deploy somewhere.

One of the great things about building on top of the Eclipse Web Tools Platform is that your project can use all the great tools provided by WTP for developing and deploying Java web applications. For example, try out the Create Servlet wizard provided by WTP:

The Create Servlet wizard makes it very easy to create new servlets, and in addition to creating the class template for you, it will also update your project’s web.xml with a mapping for the new servlet.

You’ll be able to use many other tools from WTP like custom editors for JSP and XML files, and tools for building and exporting WAR files.

The coolest benefit, however, of building on top of WTP is that you can use the deployment support in WTP to deploy your AWS Java Web Projects in exactly the same way, whether you’re uploading to a local Tomcat server for quick testing, or to a production Elastic Beanstalk environment, like we’ll see in the next part of this series.

Let’s get our new project deployed to a local Tomcat server so we can see it running. Right-click on your project and select Run As -> Run On Server. You’ll need to configure a new Tomcat server using this wizard, then Eclipse will start the server and deploy your project. When you’re done, you should see something like this:

Stay tuned for the next part of this series, where we’ll show how to use the same tools to deploy our new application to AWS Elastic Beanstalk.

Using Custom Marshallers to Store Complex Objects in Amazon DynamoDB

by zachmu | on | in Java | Permalink | Comments |  Share

Over the past few months, we’ve talked about using the AWS SDK for Java to store and retrieve Java objects in Amazon DynamoDB. Our first post was about the basic features of the DynamoDBMapper framework, and then we zeroed in on the behavior of auto-paginated scan. Today we’re going to spend some time talking about how to store complex types in DynamoDB. We’ll be working with the User class again, reproduced here:

@DynamoDBTable(tableName = "users")
public class User {
  
    private Integer id;
    private Set<String> friends;
    private String status;
  
    @DynamoDBHashKey
    public Integer getId() { return id; }
    public void setId(Integer id) { this.id = id; }
  
    @DynamoDBAttribute
    public Set<String> getFriends() { return friends; }
    public void setFriends(Set<String> friends) { this.friends = friends; }
  
    @DynamoDBAttribute
    public String getStatus() { return status; }
    public void setStatus(String status) { this.status = status; }

    @DynamoDBAttribute
    public String getStatus() { return status; }
    public void setStatus(String status) { this.status = status; }
}
 

Out of the box, DynamoDBMapper works with String, Date, and any numeric type such as int, Integer, byte, Long, etc. But what do you do when your domain object contains a reference to a complex type that you want persisted into DynamoDB?

Let’s imagine that we want to store the phone number for each User in the system, and that we’re working with a PhoneNumber class to represent it. For the sake of brevity, we are assuming it’s an American phone number. Our simple PhoneNumber POJO looks like this:

public class PhoneNumber {
    private String areaCode;
    private String exchange;
    private String subscriberLineIdentifier;
    
    public String getAreaCode() { return areaCode; }    
    public void setAreaCode(String areaCode) { this.areaCode = areaCode; }
    
    public String getExchange() { return exchange; }   
    public void setExchange(String exchange) { this.exchange = exchange; }
    
    public String getSubscriberLineIdentifier() { return subscriberLineIdentifier; }    
    public void setSubscriberLineIdentifier(String subscriberLineIdentifier) { this.subscriberLineIdentifier = subscriberLineIdentifier; }      
}

If we try to store a reference to this class in our User class, DynamoDBMapper will complain because it doesn’t know how to represent the PhoneNumber class as one of DynamoDB’s basic data types.

Introducing the @DynamoDBMarshalling annotation

The DynamoDBMapper framework supports this use case by allowing you to specify how to convert your class into a String and vice versa. All you have to do is implement the DynamoDBMarshaller interface for your domain object. For a phone number, we can represent it using the standard (xxx) xxx-xxxx pattern with the following class:

public class PhoneNumberMarshaller implements DynamoDBMarshaller<PhoneNumber>
 
   {

    @Override
    public String marshall(PhoneNumber number) {
        return "(" + number.getAreaCode() + ") " + number.getExchange() + "-" + number.getSubscriberLineIdentifier();
    }

    @Override
    public PhoneNumber unmarshall(Class<PhoneNumber> clazz, String s) {
        String[] areaCodeAndNumber = s.split(" ");
        String areaCode = areaCodeAndNumber[0].substring(1,4);
        String[] exchangeAndSlid = areaCodeAndNumber[1].split("-");
        PhoneNumber number = new PhoneNumber();
        number.setAreaCode(areaCode);
        number.setExchange(exchangeAndSlid[0]);
        number.setSubscriberLineIdentifier(exchangeAndSlid[1]);
        return number;
    }    
}

Note that the DynamoDBMarshaller interface is templatized on the domain object you’re working with, making this interface strictly typed.

Now that we have a class that knows how to convert our PhoneNumber class into a String and back, we just need to tell the DynamoDBMapper framework about it. We do so with the @DynamoDBMarshalling annotation.

@DynamoDBTable(tableName = "users")
public class User {
    
    ...
    
    @DynamoDBMarshalling (marshallerClass = PhoneNumberMarshaller.class)
    public PhoneNumber getPhoneNumber() { return phoneNumber; }    
    public void setPhoneNumber(PhoneNumber phoneNumber) { this.phoneNumber = phoneNumber; }             
}

Built-in support for JSON representation

The above example uses a very compact String representation of a phone number to use as little space in your DynamoDB table as possible. But if you’re not overly concerned about storage costs or space usage, you can just use the built-in JSON marshaling capability to marshal your domain object. Defining a JSON marshaller class takes just a single line of code:

class PhoneNumberJSONMarshaller extends JsonMarshaller<PhoneNumber> { }

However, the trade-off of using this built-in marshaller is that it produces a String representation that’s more verbose than you could write yourself. A phone number marshaled with this class would end up looking like this (with spaces added for clarity):

{
  "areaCode" : "xxx",
  "exchange: : "xxx",
  "subscriberLineIdentifier" : "xxxx"
}

When writing a custom marshaller, you’ll also want to consider how easy it will be to write a scan filter that can find a particular value. Our compact phone number representation will be much easier to scan for than the JSON representation.

We’re always looking for ways to make our customers’ lives easier, so please let us know how you’re using DynamoDBMapper to store complex objects, and what marshaling patterns have worked well for you. Share your success stories or complaints in the comments!

Working with Different AWS Regions

by zachmu | on | in Java | Permalink | Comments |  Share

Wherever you or your customers are in the world, there are AWS data centers nearby.

Each AWS region is a completely independent stack of services, totally isolated from other regions. You should always host your AWS application in the region nearest your customers. For example, if your customers are in Japan, running your website from Amazon EC2 instances in the Asia Pacific (Tokyo) region will ensure that your customers get the lowest possible latency when they connect to your site.

New in the 1.4 release of the AWS SDK for Java, the SDK now knows how to look up the endpoint for a given service in a particular region. Previously, developers needed to look up these endpoints themselves and then hard-code them into their applications when creating a client, like so:

AmazonDynamoDB dynamo = new AmazonDynamoDBClient(credentials);
dynamo.setEndpoint("https://dynamodb.us-west-2.amazonaws.com");

With the 1.4 release, the SDK will look up a service’s regional endpoint automatically, so all you have to know is which region you want to use. This newer method looks like this:

AmazonDynamoDB dynamo = new AmazonDynamoDBClient(credentials);
dynamo.setRegion(Region.getRegion(Regions.US_WEST_2));

Regions can also create and configure clients for you, like a simple factory. This is especially helpful when you’re working with multiple regions in your application and need to keep them straight. Just use region objects to create every client for you, and it will be obvious which client points to which region.

AmazonDynamoDB dynamo = Region.getRegion(Regions.US_WEST_2)
                        .createClient(AmazonDynamoDBClient.class, credentials, clientConfig);

It’s important to note that the setRegion() method isn’t thread-safe. We recommend setting the region once, when a client object is first created, then leaving it alone for the duration of the client’s life cycle. Otherwise, the SDK’s automatic retry logic could yield unexpected behavior if setRegion() is called at the wrong time. Using the Region objects as client factories encourages this pattern. If you need to talk to more than one region for a particular service, we recommend creating one service client object per region, rather than trying to share.

Finally, at times it may be useful to programmatically determine which regions a given service is available in. It’s possible to ask a Region object if a given service is supported there:

Region.getRegion(Regions.US_WEST_2).isServiceSupported(ServiceAbbreviations.Dynamodb);

For more information about which services are available in each region, see http://aws.amazon.com/about-aws/globalinfrastructure/regional-product-services/.

For more information about the available regions and edge locations, see http://aws.amazon.com/about-aws/globalinfrastructure/.

The AWS Toolkit for Eclipse at EclipseCon 2013

Jason and I are at EclipseCon in Boston this week to discuss what we’ve learned developing the AWS Toolkit for Eclipse over the last three years. Our session is chock full of advice for how to develop great Eclipse plug-ins, and offers a behind-the-scenes look at how we build the Toolkit. Here’s what we plan to cover:

Learn best practices for Eclipse plug-in development that took us years to figure out!

The AWS Toolkit for Eclipse brings the AWS cloud to the Eclipse workbench, allowing developers to develop, debug, and deploy Java applications on the AWS platform. For three years, we’ve worked to integrate AWS services into your Eclipse development workflow. We started with a small seed of functionality for managing EC2 instances, and today support nine services and counting. We learned a lot on the way, and we’d like to share!

The Toolkit touches a wide array of Eclipse technologies and frameworks, from the Web Tools Platform to the Common Navigator Framework. By now we’ve explored so much of the Eclipse platform that we’ve started to become embarrassed by the parts of the Toolkit that we wrote first. If only someone had told us the right way to do things in the first place! Instead, we had to learn the hard way how to make our code robust, our user interfaces reliable and operating-system independent (not to mention pretty).

We’re here to teach from our experience, to share all the things we wish someone had told us before we learned it the hard way. These are the pointers that will save you hours of frustration and help you deliver a better product to your customers. They’re the tips we would send back in time to tell our younger selves. We’ll show you how we used them to make the Toolkit better and how to incorporate them into your own product.

Topics include getting the most out of SWT layouts, using data binding to give great visual feedback in wizards, managing releases and updates, design patterns for resource sharing, and much more.

If you are attending the conference, come by to say hello and get all your questions about the Toolkit answered! We are also handing out $100 AWS credits to help you get started using AWS services without a financial commitment, so come talk to us and we’ll hook you up.

Eclipse: New AWS Java Project Wizard

If you’re just getting started with the AWS SDK for Java, a great way to learn the SDK is through the AWS Toolkit for Eclipse. In addition to all the tools in the AWS Toolkit for Eclipse for managing your AWS resources, deploying your applications, etc., there are also wizards for creating new AWS projects, including sample code to help get you started.

With the New AWS Java Project wizard, you can create a new Eclipse Java project, already configured with:

  • the AWS SDK for Java – including dependencies and full documentation and source attachment
  • your AWS security credentials – managed through Eclipse’s preferences
  • optional sample code demonstrating how to work with a variety of different AWS services

First, make sure that you have the latest plug-ins for the AWS Toolkit for Eclipse installed, available through the Eclipse Marketplace or directly from our Eclipse update site at http://aws.amazon.com/eclipse.

Once you have the Eclipse tools installed, open the New AWS Java Project wizard, either through the context menu in Package Explorer, or through the File -> New menu.

The New AWS Java Project wizard lets you pick the name for your project, your AWS security credentials, and any sample code that you want to start from. If you don’t have your AWS security credentials configured in Eclipse yet, the link in the wizard takes you directly to the Eclipse preferences where you can manage your AWS accounts.

Once you’ve completed the wizard, your project is all set up with the AWS SDK for Java, and you’re ready to begin coding against the AWS APIs. If you’ve configured your AWS security credentials, and selected any AWS samples to add to your application, you can immediately run the samples and begin experimenting with the APIs.

The Toolkit includes other new project wizards, too. A few months ago, we showed how to use the New AWS Android Project wizard. We plan on demonstrating the New AWS Java Web Project wizard soon.

What functionality in the AWS Toolkit for Eclipse do you find to be the most useful? Let us know in the comments below.

Are you are passionate about open source, Java, and cloud computing? Want to build tools that AWS customers use on a daily basis? Come join the AWS Java SDK and Tools team! We’re hiring!.