Front-End Web & Mobile

Introducing configurable batching size for AWS AppSync Lambda resolvers

AWS AppSync is a serverless GraphQL service that makes it easy to create single endpoint GraphQL and realtime APIs. AppSync lets you combine disparate data sources and deliver the results to applications in an expected format, as specified by your API’s schema definition. Customers use resolvers attached to AWS Lambda functions to retrieve data from various data sources, and then use Direct Lambda resolvers to build their business logic entirely in Lambda functions without writing any Velocity Template Language (VTL). Lambda resolvers support the invoking of Lambda functions in batches, which efficiently groups requests to Lambda, and lets developers optimize how they retrieve data from their backend. Batch invocation of Lambda functions has helped developers address the N+1 problem in GraphQL.

Today, we are introducing configurable batching sizes for AppSync Lambda resolvers and Direct Lambda resolvers. Now developers can easily enable batching on their Direct Lambda resolvers, and configure the maximum batching size (up to 2000 instead of the previous fixed default of 5) for their resolver. The same functionality is available for AppSync pipeline functions that use a Lambda function data source.

Handling the N+1 problem

The N+1 problem can occur in a GraphQL system when one top level query produces N items whose type contains a field that must also be resolved. For example, in the query below, listTickets returns N items, and each item has a requester field that must be resolved in turn. This yields N more queries. This is a N+1 configuration that can quickly become problematic as N increases or if this type of configuration is repeated within a nested selection set.

query getTickets {
  listTickets { // 1 top level query
    items { // N items
      id
      requester { // N additional queries
        login
      }
    }
  }
}

Let’s keep using this ticketing system to illustrate how batching can help optimize your AppSync GraphQL API. We will build a new system that lets users open tickets against products. When a user (the requester) opens a ticket, the user’s ID is saved with the ticket information. Later, a user (the assignee) can be assigned to work on the ticket, and their ID is attached to the ticket. While the ticketing system is backed by an Amazon DynamoDB table, the system retrieves user information from an existing SQL database. When our web application retrieves a list of tickets to display, it must retrieve the work information of the requester and assignee. We build our new ticketing system using the Amplify CLI GraphQL transform and an Amplify model for the Ticket type. We define a User type, but we do not configure it as a model.

type Ticket @model {
  id: ID!
  title: String!
  product: String!
    @index(name: "ByProductAndStatus", sortKeyFields: ["status"], queryField: "listTicketsByProdAndStatus")
  description: String
  requesterId: String!
  requester: User!
  assigneeId: String
  assignee: User
  status: TICKET_STATE!
}

type User {
  login: ID!
  name: String!
  manager: User
  location: String!
  tel: String!
  department: String
}

enum TICKET_STATE {
  opened
  cancelled
  assigned
  closed
  resolved
}

When deployed with AWS Amplify, Amplify creates the DynamoDB table, queries, and VTL resolvers to interact with the ticket model data. To retrieve information for the requester, we must separately attach a resolver to the field. Once deployed, we can retrieve a list of opened tickets against a product using the following query:

query MyQuery {
  listTicketsByProdAndStatus(product: "shoes", status: {beginsWith: "opened"}) {
    items {
      id
      requesterId
      requester {
        name
        login
        tel
        location
        department
      }
      status
    }
  }
}

The query returns ticket items from DynamoDB. AppSync then calls the requester field resolver to retrieve that part of the request. For each item, a separate request is made to resolve the requester field in our SQL database. A single query that returns N items yields N additional requests. This is the traditional N+1 problem.

To solve this issue, we use a Direct Lambda resolver with batching enabled and a custom maximum batching size. In the AWS AppSync console, on the schema page, we attach a data source to the requester field of the Ticket type. On the resolver page, we select an existing Lambda data source (here: fetch_users_lambda_ds). We enable batching by toggling the setting in the configure batching section. Then, we specify a maximum batching size of 100.

In the console, under Create new Resolver. For Data source name, enter fetch_users_lambda_ds. Under configure batching Enable batching and set maximum batching size to 100. Select "Save Resolver" button top-right..

Now, when the requester field is resolved, the Lambda function is invoked with batches of 100 requests. The function receives an array of context objects for each request. For example, in our request, a context object looks like the following:

  {
    arguments: {},
    identity: null,
    source: {
      createdAt: '2021-12-08T05:14:10.770Z',
      product: 'shoes',
      requesterId: 'brice',
      __typename: 'Ticket',
      description: '<description>',
      id: 'bc1ee4df-0702-4444-ae7b-8a12c43f6b56',
      title: 'Nesciunt voluptatem qui quidem quia.',
      assigneeId: 'eric',
      updatedAt: '2021-12-08T05:14:10.770Z',
      status: 'opened'
    },
    request: { headers: [Object], domainName: null },
    prev: null,
    info: {
      selectionSetList: [Array],
      selectionSetGraphQL: '{\n  name\n  login\n  tel\n  location\n  department\n}',
      fieldName: 'requester',
      parentTypeName: 'Ticket',
      variables: {}
    },
    stash: {}
  }

The function can execute any business logic, and it must return an array of results in the same order as the array of context objects. The following Lambda function code fetches data from an RDS MySql database via an Amazon RDS proxy. With a batch of requests to fulfill, the Lambda function can optimize its queries. Here, instead of doing individual queries for each context.source.requesterId, the functions builds the unique set of IDs and executes one query. For example:

SELECT * FROM Users WHERE login IN ('wanjiru','brice','youssef')

The Lambda function code for our resolver is as follows:

var aws_sdk = require('aws-sdk')
var mysql = require('mysql2')

var { RDS_PROXY_URL, DATABASE, USERNAME, REGION } = process.env
var signer = new aws_sdk.RDS.Signer({
  region: REGION,
  port: 3306,
  username: USERNAME,
  hostname: RDS_PROXY_URL,
})
var initConn = () => {
  const connectionConfig = {
    host: RDS_PROXY_URL,
    database: DATABASE,
    user: USERNAME,
    ssl: 'Amazon RDS',
    authPlugins: { mysql_clear_password: () => () => signer.getAuthToken({}) },
  }
  return mysql.createConnection(connectionConfig)
}
var connection = initConn()
exports.handler = async (event) => {
  try {
    const ids = event.map((context) => {
      // fetch the fieldName we are resolving from the `info` object.
      // then grab the corresponding ID from the source
      const key = context.info.fieldName + 'Id'
      return context.source[key]
    })

    const idSet = `(${[...new Set(ids)].map((id) => `'${id}'`).join(',')})`
    const sql = `SELECT * from Users where login in ${idSet}`
    
    return await new Promise((resolve, reject) => {
      connection.query(sql, (err, results, fields) => {
        // reduce the list of results to a map using the login at a key
        const resultMap = results.reduce((prev, curr) => {
          prev[curr.login] = curr
          return prev
        }, {})
        // for each id requested, find the related result in the map
        const res = ids.map((id) => ({ data: resultMap[id] }))
        // return the response
        resolve(res)
      })
    })
  } catch (error) {
    console.log(error)
  }
}

To return the results in the same order as the array of contexts, the function maps the original input to a list of results:

const res = ids.map((id) => ({ data: resultMap[id] }))

You can return individual errors for each request by including an errorMessage and errorType in your result object.

Conclusion

Now, you can configure the batch size for Lambda resolvers in AppSync, thereby addressing the N+1 problem, and optimizing how often your Lambda resolvers are invoked. In turn, this can help improve performance and manage costs when accessing your data with AppSync using Lambda functions. Note that, while you can batch up to 2000 items together, payload size limits still apply. You are responsible for making sure that your requests and responses do not exceed the service payload size limits. See Lambda quotas, and AppSync endpoints and quotas for more details. Visit the AWS AppSync documentation for more information regarding how to configure your Lambda functions to use with batching.

About the author

Brice Pellé

Brice Pellé is Principal Solution Architect working on AWS AppSync.