AWS Database Blog

Building a GraphQL interface to Amazon QLDB with AWS AppSync: Part 2

This post is the second installment of a two-post series discussing how to integrate Amazon Quantum Ledger Database (QLDB) and AWS AppSync. This combination provides a versatile, GraphQL-powered API on top of the Amazon QLDB-managed ledger database. For information about connecting Amazon QLDB and AWS AppSync by building an AWS Lambda function and running a query, see Building a GraphQL interface to Amazon QLDB with AWS AppSync: Part 1.

This post continues developing the integration to support more complex queries and mutation of data in QLDB from AWS AppSync. The walkthrough also adds support for querying historical data that Amazon QLDB and AWS AppSync capture.

Both posts in this series use a DMV dataset of driver vehicle information for its queries. You can also review the complete application in the GitHub repo.

The following diagram shows the high-level architecture for this project. You can craft PartiQL queries to fulfill various operations that you run on Amazon QLDB integration function in Lambda by using AWS AppSync resolvers.

Most of this post focuses on creating new resolvers to interact with Amazon QLDB in new ways, but you also extend the integration function to support multiple queries in a single transaction.

Adding support for multi-step transactions

When working with Amazon QLDB, you often need to look up a value, for example, a document ID, before executing a subsequent query. The version of the integration function discussed in the first post in this series supported a single query per transaction. In this post, you modify the function to support running multiple queries in a single transaction. Many of the queries you need to support the DMV use case require inspection of results in one query before passing to the next, so you add support for JMESPath to retrieve values from the proceeding query. JMESPath is a query language for JSON that allows the extraction of data from elements in a JSON document.

Consider the use case in which you need to look up a specific owner’s vehicles. You first need to query the Person table to retrieve the document ID for the owner (a government identifier, such as a driver’s license number). For instructions, see Using the BY Clause to Query Document ID.

You also want to find the driver’s vehicles in the VehicleRegistration table before you retrieve vehicle details in the Vehicle table. You perform these steps with a single JOIN query. For more information, see Joins.

The first query retrieves the document ID for a given person using the BY clause. See the following code:

SELECT id FROM Person AS t BY id WHERE t.GovId = ?

The result of that query includes the ID needed for the subsequent query of the Vehicle table. This query also allows you to review how PartiQL queries can interact with nested data. In this case, the query matches against a nested field in the Owners field. See the following code:

SELECT Vehicle FROM Vehicle INNER JOIN VehicleRegistration AS r ON Vehicle.VIN = r.VIN WHERE r.Owners.PrimaryOwner.PersonId = ?

The preceding query also makes use of a JOIN across the VehicleRegistration and Vehicle tables. If you are familiar with SQL, this is an inner join query so you can traverse from the vehicle ownership information (stored in VehicleRegistration) to the vehicle data (in the Vehicle table).

To support this type of transaction in the API, adjust the integration function code with the following code:

private String executeTransaction(List<Query> queries) {
  try (QldbSession qldbSession = createQldbSession()) {
    List<String> result = new ArrayList<String>();

    qldbSession.execute((ExecutorNoReturn) txn -> {
      for (Query q : queries) {
        LOGGER.info("Executing query: {}", q.query);
        String lastResult = result.size() > 0 ? result.get(result.size() - 1) : "";
        result.add(executeQuery(txn, q, lastResult));
      }
    }, (retryAttempt) -> LOGGER.info("Retrying due to OCC conflict..."));

    return result.get(result.size() - 1);
  } catch (QldbClientException e) {
    LOGGER.error("Unable to create QLDB session: {}", e.getMessage());
  }

  return "{}";
}

private String executeQuery(TransactionExecutor txn, Query query, String lastResult) {
  final List<IonValue> params = new ArrayList<IonValue>();
  query.getArgs().forEach((a) -> {
    LOGGER.debug("Adding arg {} to query", a);
    try {
      String arg = a.startsWith("$.") && !lastResult.isEmpty() ?
                       queryWithJmesPath(lastResult, a.substring(2)) : a;
      params.add(MAPPER.writeValueAsIonValue(arg));
    } catch (IOException e) {
      LOGGER.error("Could not write value as Ion: {}", a);
    }
  });

  // Execute the query and transform response to JSON string...
  List<String> json = new ArrayList<String>();
  txn.execute(query.getQuery(), params).iterator().forEachRemaining(r -> {
    String j = convertToJson(r.toPrettyString());
    json.add(j);
  });

  return json.toString();
}

You extend these two functions so a single call can support multiple queries to the function. This allows for more complex, multi-step queries and mutations.

In the following section, you explore how to use this new functionality in the DMV use case. But you first need to add support for searching the result using JMESPath.

Arguments that require a JMESPath expression to retrieve data from the proceeding result are prepended by “$” (a convenient indicator, nothing more; it’s stripped from the expression). To run the search, introduce a new function with the following code:

private String queryWithJmesPath(String json, String jmesExpression)
            throws IOException {
  LOGGER.debug("Query with JMESPath: {} on {}", jmesExpression, json);

  JmesPath<JsonNode> jmespath = new JacksonRuntime();
  Expression<JsonNode> expression = jmespath.compile(jmesExpression);

  ObjectMapper om = new ObjectMapper();
  JsonNode input = om.readTree(json);

  return om.writeValueAsString(expression.search(input)).replaceAll("^\"|\"$", "");
}

The preceding code allows the passing of more complex queries, such as finding vehicles by owner. You can also implement multi-step queries with AWS AppSync pipeline resolvers; adding to the integration function was a design decision.

Next, you add support for this query and a few mutations to the GraphQL API and implement appropriate resolvers.

Extending the DMV GraphQL API

In AWS AppSync, the GraphQL schema defines the shape of the data and operations available for a given API. You started constructing the schema for the DMV API in the previous post and continue to extend it here by adding a second query to find vehicles by owner. You also add a new type, mutations, which provides operations that change data in Amazon QLDB. See the following code:

type Query {
  ...
  vehiclesByOwner(govId:ID!):[Vehicle]
}

type Mutation {
  updateVehicleOwner(vin:ID!, govId:ID!):Boolean
  addSecondaryOwner(vin:ID!, govId:ID!):Boolean
}

schema {
  query: Query
  mutation: Mutation
}

To implement the `vehiclesByOwner` query, attach a new resolver that uses the integration function data source with the following request mapping template:

{
  "version": "2017-02-28",
  "operation": "Invoke",
  "payload": {
    "action": "Query",
    "payload": [
      {
        "query": "SELECT id FROM Person AS t BY id WHERE t.GovId = ?",
        "args": [ "$ctx.args.govId" ]
      },
      {
        "query": "SELECT Vehicle FROM Vehicle INNER JOIN VehicleRegistration AS r ON Vehicle.VIN = r.VIN WHERE r.Owners.PrimaryOwner.PersonId = ?",
        "args": [ "$.[0].id" ]
      }
    ]
  }
}

This resolver is similar to the `getVehicle` resolver in the previous post, but adds a second query to the payload. These are the same queries as discussed earlier that allow lookup of vehicles owned by a person. The first query takes as an argument a value (govId) passed via the GraphQL operation; the second uses JMESPath to retrieve a value from the proceeding query. For more information about the response mapping template, see the GitHub repo.

Run the vehiclesByOwner query using a GraphQL payload such as the following code:

query VehiclesByOwner {
  vehiclesByOwner(govId:"LOGANB486CG") {
    VIN
    Make
    Model
  }
}

You receive the following response:

{
  "data": {
    "vehiclesByOwner": [
      {
        "VIN": "KM8SRDHF6EU074761",
        "Make": "Tesla",
        "Model": "Model S"
      }
    ]
  }
}

The values selected in each PartiQL query are hardcoded in the request mapping. Although you don’t implement it here, you can extend the integration function to select only the values included in the GraphQL selection set (VIN, Make, and Model in the preceding query). Returning only the desired values could improve performance because only the needed data is returned. AWS AppSync provides a list of the selected values in the $context.info section of the GraphQL request available to the mapping template. For more information, see AWS AppSync and the GraphQL Info Object.

Mutations

Implementing a mutation (an operation that changes data) is very similar to the preceding query. Instead of using a SELECT statement, use a CREATE or UPDATE statement. The following code implements the resolver for the updateVehicleOwner mutation:

  "version": "2017-02-28",
  "operation": "Invoke",
  "payload": {
    "action": "Query",
    "payload": [
      {
        "query": "SELECT id FROM Person AS t BY id WHERE t.GovId = ?",
        "args": [ "$ctx.args.govId" ]
      },
      {
        "query": "UPDATE VehicleRegistration AS v SET v.Owners.PrimaryOwner.PersonId = ? WHERE v.VIN = ?",
        "args": [ "$.[0].id", "$ctx.args.vin" ]
      }
    ]
  }
}

This mutation is very similar to the earlier query. Again, you find the document ID for the person associated with the passed govId value. In the second operation, you use an UPDATE statement to change a nested value in the VehicleRegistration table. The API returns the result of the mutation (true if successful; otherwise false); however, you could also add a third operation to retrieve updated data and return that to the caller.

Querying historical data

Amazon QLDB uses an immutable transaction log, called a journal, to maintain a history of changes to data. This history is complete and verifiable, meaning you can traverse back over changes in the journal to confirm that the data hasn’t been tampered with. Doing so involves verifying the cryptographically-chained signature of each transaction.

In the DMV use case, it’s useful to track vehicle ownership over time. You can easily query for this type of change without needing an audit table by using the Amazon QLDB journal. To add a vehicle ownership query, start by adding it to the AWS AppSync schema (getOwnershipHistory). See the following code:

type Query {
  ...
  getOwnershipByHistory(vin:ID!, monthsAgo:Int):[History]
}

The getOwnershipHistory query takes two arguments: the VIN for the vehicle and the number of months to search back. The query also returns a new data type that you need to add to the schema: History. See the following code:

type History {
  id: ID!
  version: Int!
  txTime: AWSDateTime! # transaction timestamp
  txId: ID!            # transaction ID
  data: HistoryData
  hash: String!
}

union HistoryData = Person | Owner | Vehicle | VehicleRegistration

The History type includes a field called data that uses the union type HistoryData. Union types in GraphQL allow for different types that don’t share any common fields to be returned in a single type. For the DMV API, you specify that HistoryData could include any of the other types in the API.

The getOwnershipHistory query is similar to the other resolvers described earlier and uses the same integration function as a data source. For this use case, calculate the timestamp for the start of the search as well (defaulting to 3 months). See the following code:

#set( $months = $util.defaultIfNull($ctx.args.monthsAgo, 3) )
#set( $currEpochTime = $util.time.nowEpochMilliSeconds() )
#set( $fromEpochTime = $currEpochTime - ($months * 30 * 24 * 60 * 60 * 1000) )
#set( $fromTime = $util.time.epochMilliSecondsToISO8601($fromEpochTime) )
{
  "version": "2017-02-28",
  "operation": "Invoke",
  "payload": {
    "action": "Query",
    "payload": [
      {
        "query": "SELECT id FROM VehicleRegistration AS t BY id WHERE t.VIN = ?",
        "args": [ "$ctx.args.vin" ]
      },
      {
        "query": "SELECT * FROM history(VehicleRegistration, `$fromTime`) AS h WHERE h.metadata.id = ?",
        "args": [ "$.[0].id" ]
      }
    ]
  }
}

This query uses the history function, a PartiQL extension, to retrieve revisions from the system view of the table. The history function can accept an optional start and end time in ISO 8601 format, denoted as Ion timestamps (be sure to surround each with backticks). For more information, see Querying Revision History.

The response mapping template is somewhat more complex because the history response from Amazon QLDB includes data that the API doesn’t return. The template iterates through each item in the response and builds an object that matches the shape of History. You also need to set the __typename field for HistoryData to support the use of the union type (the value is always “VehicleRegistration” for this query). See the following code:

#set( $result = $util.parseJson($ctx.result.result) )
#set( $history = [] )

#foreach($item in $result)
  #set( $data = $item.data )
  $util.qr($data.put("__typename", "VehicleRegistration"))
  #set( $h = {
    "id": "$item.metadata.id",
    "version": $item.metadata.version,
    "txTime": "$item.metadata.txTime",
    "txId": "$item.metadata.txId",
    "hash": "$item.hash",
    "data": $data
  } )
  $util.qr($history.add($h))
#end

$util.toJson($history)

To query for the ownership history of a particular vehicle, enter the following the GraphQL query:

query GetVehicleHistory {
  getOwnershipHistory(vin:"3HGGK5G53FM761765") {
    version
    txTime
    data {
      ... on VehicleRegistration {
        Owners {
          PrimaryOwner {
            FirstName
            LastName
          }
        }
      }
    }
  }
}

Before running this query, use the updateVehicleOwner mutation to change the primary owner of this vehicle. This operation is equivalent to one person selling the vehicle to another. The following code is the query result:

{
  "data": {
    "getOwnershipHistory": [
      {
        "version": 0,
        "txTime": "2020-01-30T16:11:00.549Z",
        "data": {
          "Owners": {
            "PrimaryOwner": {
              "FirstName": "Alexis",
              "LastName": "Pena"
            }
          }
        }
      },
      {
        "version": 1,
        "txTime": "2020-02-05T22:17:36.262Z",
        "data": {
          "Owners": {
            "PrimaryOwner": {
              "FirstName": "Brent",
              "LastName": "Logan"
            }
          }
        }
      }
    ]
  }
}

In the result, you can quickly find not just who owned the vehicle, but also when the owner Alexis sold the vehicle to Brent. Amazon QLDB captures this data transparently in the journal and makes it easy to query. Amazon QLDB also supports data verification to make sure that it hasn’t been tampered with, though that process is outside of the scope of this post.

Conclusion

Integrating Amazon QLDB with AWS AppSync enables powerful features as part of GraphQL APIs. A flexible integration function allows a single Lambda data source to support a multitude of PartiQL queries, including the history of data in Amazon QLDB. For more information and the complete source code, see the GitHub repo.

AWS welcomes feedback. Please share how you’re using Amazon QLDB and integrating with other AWS services by reaching out on the Amazon QLDB forum.

 


About the Author

 

Josh Kahn is a Principal Solutions Architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.