Autopagination feature in the AWS SDK for Java 2.0

This blog post is part of a series that outlines changes coming in the AWS SDK for Java 2.0. Read our Developer Preview announcement for more information about why we’re so excited for this new version of the SDK.

We’re pleased to announce the support for automatic pagination in the AWS SDK for Java 2.0. Many of the AWS APIs are paginated, that is, results are truncated in multiple pages instead of returning all results in a single response page. This is useful in cases where the return results are too large to send in a single response. When the results are truncated, a token value is included in the response page, which can be sent with the subsequent request to get the next set of results. This process repeats until there are no more results.

In the AWS SDK for Java 1.x, users must handle the logic of identifying the next token and making the subsequent requests. Because services have different names for pagination tokens, customers have to refer to the documentation to identify the proper name for the next pagination token and how it should be passed in the next request. The autopagination feature in the SDK for Java 2.0 simplifies the customer experience by identifying the next token values and making service calls automatically on behalf of the customer. New APIs that use autopagination are available to the clients. For example, you can use the new operation, listTablesPaginator, in the Amazon DynamoDB client for autopagination. You can focus on working with the results instead of on the details of retrieving the next page of results. The manual equivalent of that operation, listTables, is still available. All clients contain both manual pagination APIs and autopagination APIs to support customers with different needs.

Sync pagination

Let’s look at some examples that use manual pagination and autopagination APIs in sync clients. We’ll use the Amazon DynamoDB ListTables operation for our code examples. The ListTables API can only return up to 100 table names per page. To get the next set of table names, a token (LastEvaluatedTableName in ListTablesResponse) is included in each response page, which can be used in a subsequent request. All examples print the table names in your Amazon DynamoDB account for the us-west-2 AWS Region. Let’s say you have 15 tables in your Amazon DynamoDB account in the us-west-2 Region. To make it easy to get multiple pages, in our examples, we limit the number of table names per response page to a maximum of three. All examples print the table names in your DynamoDB account for the us-west-2 AWS Region.

Using a manual pagination API

DynamoDBClient client = DynamoDBClient.create();
ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();
boolean done = false;
while (!done) {
    ListTablesResponse listTablesResponse = client.listTables(listTablesRequest);
    System.out.println(listTablesResponse.tableNames());

    if (listTablesResponse.lastEvaluatedTableName() == null) {
        done = true;
    }

    listTablesRequest = listTablesRequest.toBuilder()
                                         .exclusiveStartTableName(listTablesResponse.lastEvaluatedTableName())
                                         .build();
}

In the above example, you have to identify if the response from the service is the last page by doing the listTablesResponse.lastEvaluatedTableName() == null check. If it’s not the last page, continue making more calls until there are no more results. There will be a total of five service calls made by the client, because we limit the table names to a maximum of three per page.

Using an autopagination API

DynamoDBClient client = DynamoDBClient.create();
ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();

ListTablesIterable responses = client.listTablesPaginator(listTablesRequest);

for (ListTablesResponse response : responses) {
    System.out.println(response.tableNames());
}

This example uses less code, and you can focus on your application’s main objective (printing table names). Notice the name of the API is ListTablesPaginator. The result type (ListTablesIterable in the example above) for sync autopagination APIs is a custom Iterable that can be used to iterate through all the pages. Similar to the manual pagination example, the number of service calls made by the client is five in this example. The service calls are made automatically after you finish reading the table names in the current page.

Using the stream method

The result class also has a convenient stream() method that returns all the pages as a Java 8 stream.

ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();
ListTablesIterable responses = client.listTablesPaginator(listTablesRequest);

// Print the table names using the responses stream
responses.stream().forEach(response -> System.out.println(response.tableNames()));

// Convert the stream of responses to a stream of table names, then print the table names
responses.stream()
         .flatMap(response -> response.tableNames().stream())
         .forEach(System.out::println);

A new stream is created every time the stream() method is called. Each stream instance can be used to print all the table names. This example prints the list of table names two times as we work with two streams separately.

Iterate on the underlying item collection directly

One common use case for using paginated APIs is to work with the underlying collection of items. For example, printing the table names or collecting the table names to a list in the ListTables operation. In the last example, we achieved this use case by converting the responses stream to a stream of table names using flatMap. For extra convenience, we exposed a method in the result class that will return an Iterable of the underlying item collection. The name of the method will be the same as the getter method for that item in the page response.

Using tableNames() in ListTablesIterable

ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();
ListTablesIterable responses = client.listTablesPaginator(listTablesRequest);
Iterable<String> tableNames = responses.tableNames();
tableNames.forEach(System.out::println);

Note: This convenience method might not be included in the result type of all autopagination APIs.

Async pagination

Async autopagination APIs are based on the well-known reactive streams model. If you are not familiar with the reactive streams model, see the Reactive Streams official repo on Github for more information.

The API names in the async client are the same as those in the sync client. The return type is an implementation of the Publisher interface, which can be used to request pages as needed.

Using the autopagination API for the ListTables operation

DynamoDBAsyncClient asyncClient = DynamoDBAsyncClient.create();
ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();
ListTablesPublisher publisher = asyncClient.listTablesPaginator(listTablesRequest);

// Call a subscribe method to create a new subscription.
// A Subscription represents a one-to-one lifecycle of a subscriber subscribing to a publisher
publisher.subscribe(new Subscriber<ListTablesResponse>() {
    // Maintain a reference to the subscription object, which is required to request data from the publisher
    private Subscription subscription;

    @Override
    public void onSubscribe(Subscription s) {
        subscription = s;
        // Call the request method to demand data; Here we request a single page
        subscription.request(1);
    }

    @Override
    public void onNext(ListTablesResponse response) {
        response.tableNames().forEach(System.out::println);
        // Once you process the current page, call the request method to signal that you are ready for next page
        subscription.request(1);
    }

    @Override
    public void onError(Throwable t) {
        // Called when an error has occurred while processing the requests
    }

    @Override
    public void onComplete() {
        // This indicates all the results are delivered and there are no more pages left
    }
});

// As the above code is non-blocking, make sure your application doesn't end immediately
// For this example, I am using Thread.sleep to wait for all pages to get delivered
Thread.sleep(3_000);

The return type for the listTablesPaginator API in DynamoDBAsyncClient is ListTablesPublisher (an implementation of the reactive streams Publisher interface). You can subscribe to the publisher by passing a subscriber to the subscribe() method. The code above uses a simple sequential subscriber that requests one item at a time. Remember the publisher won’t send data until the Subscription.request method is called. In the code example, we call the request method in the onSubscribe method to start requesting data. Then a service call will be made to retrieve the first page, which will be delivered to the subscriber via the onNext method. We print the table names in the current page and then call the request method again, asking for the next page. After all the pages are delivered, the onComplete method will be called to indicate the end of the stream. If there are any errors during the processing of the request, the onError method will be called.

You might not want to create a new subscriber for simple use cases like printing the table names. Instead, you can use the forEach helper method as shown below.

ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();
ListTablesPublisher publisher = asyncClient.listTablesPaginator(listTablesRequest);
CompletableFuture<Void> future = publisher.forEach(response -> response.tableNames()
                                                                       .forEach(System.out::println));
future.get();

Iterate on the underlying item collection directly

Similar to the sync result type, the async result class has a method to interact with the underlying item collection. The name of the method will be the same as in the sync case. But the return type of the convenience method will be a publisher that can be used to request items across all pages.

Using tableNames() in ListTablesPublisher

ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();
ListTablesPublisher listTablesPublisher = asyncClient.listTablesPaginator(listTablesRequest);
SdkPublisher<String> publisher = listTablesPublisher.tableNames();

// Use forEach
CompletableFuture<Void> future = publisher.forEach(System.out::println);
future.get();

// Use subscriber
publisher.subscribe(new Subscriber<String>() {
    private Subscription subscription;

    @Override
    public void onSubscribe(Subscription s) {
        subscription = s;
        subscription.request(1);
    }

    @Override
    public void onNext(String tableName) {
        System.out.println(tableName);
        subscription.request(1);
    }

    @Override
    public void onError(Throwable t) { }

    @Override
    public void onComplete() { }
});

Note: This convenience method might not be included in the result type of all autopagination APIs.

Using reactive streams helper libraries

Implementing a custom subscriber is not a trivial task. Because the async autopaginaton APIs use reactive streams interfaces, they are interoperable with other reactive streams implementations. This enables you the ability to use the SDK responses with third-party implementations like RxJava, Akka Streams, and others. These libraries can help you to achieve simple use cases without the need for a custom subscriber. Let’s look at an example of using RxJava to collect the Amazon DynamoDB table names into a list. For the example to work, you need to add a dependency on RxJava. If you are using Maven, you can add the following dependency in the POM file.

<!-- Add this to your POM file -->
<dependency>
    <groupId>io.reactivex.rxjava2</groupId>
    <artifactId>rxjava</artifactId>
    <version>2.1.9</version>
</dependency>

Collect table names into a list and print them

DynamoDBAsyncClient asyncClient = DynamoDBAsyncClient.create();
ListTablesPublisher publisher = asyncClient.listTablesPaginator(ListTablesRequest.builder()
                                                                                 .build());

// The Flowable class has many helper methods that work with any reactive streams compatible publisher implementation
List<String> tables = Flowable.fromPublisher(publisher)
                              .flatMapIterable(ListTablesResponse::tableNames)
                              .toList()
                              .blockingGet();
System.out.println(tables);

Resuming after a failure

There can be multiple service calls to retrieve all the results in paginated APIs, and errors might occur while making the service calls to retrieve pages. If the errors are due to transient issues, you might want to resume the iteration from where the error occurred, instead of restarting the whole operation. To support this, we have exposed a resume method in the result type of both sync and async autopagination APIs.

Using the resume method in a sync API

ListTablesRequest listTablesRequest = ListTablesRequest.builder().limit(3).build();
ListTablesIterable responses = client.listTablesPaginator(listTablesRequest);

ListTablesResponse lastSuccessfulPage = null;
try {
    for (ListTablesResponse response : responses) {
        response.tableNames().forEach(System.out::println);
        lastSuccessfulPage = response;
    }
} catch (Exception exception) {
    if (lastSuccessfulPage != null) {
        // We have captured the last page sent by the service and can use it to resume the operation
        ListTablesIterable resumedResponses = responses.resume(lastSuccessfulPage);
        // Use the resumed result object to print the remaining table names
        resumedResponses.tableNames().forEach(System.out::println);
    }
}

The resume method will return a new instance of that class that can be used to start the iteration from the failure point. Notice the return type of the resume method is ListTablesIterable, which is the same as the return type of the listTablesPaginator API. The resume method is available only when iterating over the response pages. Similar to sync, the async result object will also have a resume method.

Giving feedback and contributing

You can provide feedback to us in several ways.

Public feedback

GitHub issues. Customers who are comfortable giving public feedback can open a Github issue in the V2 repo. This is the preferred mechanism to give feedback so that other customers can engage in the conversation, +1 issues, etc. Issues you open will be evaluated, and included in our roadmap for the GA launch.

Gitter channel. For informal discussion or general feedback, you may join the Gitter chat for the V2 repo. The Gitter channel is also a great place to get help with the Developer Preview, but feel free to also open an issue.

Private feedback

Those who prefer not to give public feedback can instead email the aws-java-sdk-v2-feedback@amazon.com mailing list. This list is monitored by the AWS SDK for Java team and will not be shared with anyone outside of AWS. An SDK team member may respond back to ask for clarification or acknowledge that the feedback was received and is being evaluated.

Contributing

You can open pull requests for fixes or additions to the AWS SDK for Java 2.0 Developer Preview. All pull requests must be submitted under the Apache 2.0 license and will be reviewed by an SDK team member prior to merging. Accompanying unit tests are appreciated.

AWS Developer Tools Blog

Autopagination feature in the AWS SDK for Java 2.0

Sync pagination

Iterate on the underlying item collection directly

Async pagination

Iterate on the underlying item collection directly

Using reactive streams helper libraries

Resuming after a failure

Giving feedback and contributing

Public feedback

Private feedback

Contributing

Resources

Follow

Learn

Resources

Developers

Help