Tag: iterators


Iterating through Amazon DynamoDB Results

by Jeremy Lindblom | on | in PHP | Permalink | Comments |  Share

The AWS SDK for PHP has a feature called "iterators" that allows you to retrieve an entire result set without manually handling pagination tokens or markers. The iterators in the SDK implement PHP’s Iterator interface, which allows you to easily enumerate or iterate through resources from a result set with foreach.

The Amazon DynamoDB client has iterators available for all of the operations that return sets of resoures, including Query, Scan, BatchGetItem, and ListTables. Let’s take a look at how we can use the iterators feature with the DynamoDB client in order to iterate through items in a result.

Specifically, let’s look at an example of how to create and use a Scan iterator. First, let’s create a client object to use throughout the rest of the example code.

<?php

require 'vendor/autoload.php';

use AwsDynamoDbDynamoDbClient;

$client = DynamoDbClient::factory(array(
    'key'    => '[aws access key]',
    'secret' => '[aws secret key]',
    'region' => '[aws region]' // (e.g., us-west-2)
));

Next, we’ll create a normal Scan operation without an iterator. A DynamoDB Scan operation is used to do a full table scan on a DynamoDB table. We want to iterate through all the items in the table, so we will just provide the TableName as a parameter to the operation without a ScanFilter.

$result = $client->scan(array(
    'TableName' => 'TheNameOfYourTable',
));

foreach ($result['Items'] as $item) {
    // Do something with the $item
}

The $result variable will contain a GuzzleServiceResourceModel object, which is an array-like object structured according to the description in the API documentation for the scan method. However, DynamoDB will only return up to 1 MB of results per Scan operation, so if your table is larger than 1 MB and you want to retrieve the entire result set, you will need to perform subsequent Scan operations that include the ExclusiveStartKey parameter. The following example shows how to do this:

$startKey = array();

do {
    $args = array('TableName' => 'TheNameOfYourTable') + $startKey;
    $result = $client->scan($args);

    foreach ($result['Items'] as $item) {
        // Do something with the $item
    }

    $startKey['ExclusiveStartKey'] = $result['LastEvaluatedKey'];
} while ($startKey['ExclusiveStartKey']);

Using an iterator to perform the Scan operation makes this much simpler.

$iterator = $client->getScanIterator(array(
    'TableName' => 'TheNameOfYourTable'
));

foreach ($iterator as $item) {
    // Do something with the $item
}

Using the iterator allows you to get the full result set, regardless of how many MB of data there are, and still be able to use a simple syntax to iterate through the results. The actual object returned by getScanIterator(), or any get*Iterator() method, is an instance of the AwsCommonIteratorAwsResourceIterator class.

Warning: Doing a full table scan on a large table may consume a lot of provisioned throughput and, depending on the table’s size and throughput settings, can take time to complete. Please be cautious before running the examples from this post on your own tables.

Iterators also allow you to put a limit on the maximum number of items you want to iterate through.

$iterator = $client->getScanIterator(array(
    'TableName' => 'TheNameOfYourTable'
), array(
    'limit' => 20
));

$count = 0;
foreach ($iterator as $item) {
    $count++;
}
echo $count;
#> 20

Now that you know how iterators work, let’s work through another example. Let’s say you have a DynamoDB table named "Contacts" with the following simple schema:

  • Id (Number)
  • FirstName (String)
  • LastName (String)

You can display the full name of each contact with the following code:

$contacts = $client->getScanIterator(array(
    'TableName' => 'Contacts'
));

foreach ($contacts as $contact) {
    $firstName = $contact['FirstName']['S'];
    $lastName = $contact['LastName']['S'];
    echo "{$firstName} {$lastName}n";
}

Item attribute values in your DynamoDB result are keyed by both the attribute name and attribute type. In many cases, especially when using a loosely typed language like PHP, the type of the item attribute may not be important, and a simple associative array might be more convenient. The SDK (as of version 2.4.1) includes the AwsDynamoDbIteratorItemIterator class which you can use to decorate a Scan, Query, or BatchGetItem iterator object in order to enumerate the items without the type information.

use AwsDynamoDbIteratorItemIterator;

$contacts = new ItemIterator($client->getScanIterator(array(
    'TableName' => 'Contacts'
)));

foreach ($contacts as $contact) {
    echo "{$contact['FirstName']} {$contact['LastName']}n";
}

The ItemIterator also has two more features that can be useful for certain schemas.

  1. If you have attributes of the binary (B) or binary set (BS) type, the ItemIterator will automatically apply base64_decode() to the values for you.
  2. The item will actually be enumerated as a GuzzleCommonCollection object. A Collection behaves like an array (i.e., it implements the ArrayAccess interface) and has some additional convenience methods. Additionally, it returns null instead of triggering notices for undefined indices. This is useful for working with items, since the NoSQL nature of DynamoDB does not restrict you to following a fixed schema with all of your items.

We hope that using iterators makes working with the AWS SDK for PHP easier and reduces the amount of code you have to write. You can use the ItemIterator class to get even easier access to the data in your Amazon DynamoDB tables.