Front-End Web & Mobile
Introducing configurable batching size for AWS AppSync Lambda resolvers
AWS AppSync is a serverless GraphQL service that makes it easy to create single endpoint GraphQL and realtime APIs. AppSync lets you combine disparate data sources and deliver the results to applications in an expected format, as specified by your API’s schema definition. Customers use resolvers attached to AWS Lambda functions to retrieve data from various data sources, and then use Direct Lambda resolvers to build their business logic entirely in Lambda functions without writing any Velocity Template Language (VTL). Lambda resolvers support the invoking of Lambda functions in batches, which efficiently groups requests to Lambda, and lets developers optimize how they retrieve data from their backend. Batch invocation of Lambda functions has helped developers address the N+1 problem in GraphQL.
Today, we are introducing configurable batching sizes for AppSync Lambda resolvers and Direct Lambda resolvers. Now developers can easily enable batching on their Direct Lambda resolvers, and configure the maximum batching size (up to 2000 instead of the previous fixed default of 5) for their resolver. The same functionality is available for AppSync pipeline functions that use a Lambda function data source.
Handling the N+1 problem
The N+1 problem can occur in a GraphQL system when one top level query produces N items whose type contains a field that must also be resolved. For example, in the query below, listTickets
returns N items, and each item has a requester
field that must be resolved in turn. This yields N more queries. This is a N+1 configuration that can quickly become problematic as N increases or if this type of configuration is repeated within a nested selection set.
Let’s keep using this ticketing system to illustrate how batching can help optimize your AppSync GraphQL API. We will build a new system that lets users open tickets against products. When a user (the requester) opens a ticket, the user’s ID is saved with the ticket information. Later, a user (the assignee) can be assigned to work on the ticket, and their ID is attached to the ticket. While the ticketing system is backed by an Amazon DynamoDB table, the system retrieves user information from an existing SQL database. When our web application retrieves a list of tickets to display, it must retrieve the work information of the requester and assignee. We build our new ticketing system using the Amplify CLI GraphQL transform and an Amplify model for the Ticket
type. We define a User
type, but we do not configure it as a model.
When deployed with AWS Amplify, Amplify creates the DynamoDB table, queries, and VTL resolvers to interact with the ticket model data. To retrieve information for the requester
, we must separately attach a resolver to the field. Once deployed, we can retrieve a list of opened tickets against a product using the following query:
The query returns ticket items from DynamoDB. AppSync then calls the requester
field resolver to retrieve that part of the request. For each item, a separate request is made to resolve the requester
field in our SQL database. A single query that returns N items yields N additional requests. This is the traditional N+1 problem.
To solve this issue, we use a Direct Lambda resolver with batching enabled and a custom maximum batching size. In the AWS AppSync console, on the schema page, we attach a data source to the requester
field of the Ticket
type. On the resolver page, we select an existing Lambda data source (here: fetch_users_lambda_ds). We enable batching by toggling the setting in the configure batching section. Then, we specify a maximum batching size of 100.
Now, when the requester
field is resolved, the Lambda function is invoked with batches of 100 requests. The function receives an array of context
objects for each request. For example, in our request, a context object looks like the following:
{
arguments: {},
identity: null,
source: {
createdAt: '2021-12-08T05:14:10.770Z',
product: 'shoes',
requesterId: 'brice',
__typename: 'Ticket',
description: '<description>',
id: 'bc1ee4df-0702-4444-ae7b-8a12c43f6b56',
title: 'Nesciunt voluptatem qui quidem quia.',
assigneeId: 'eric',
updatedAt: '2021-12-08T05:14:10.770Z',
status: 'opened'
},
request: { headers: [Object], domainName: null },
prev: null,
info: {
selectionSetList: [Array],
selectionSetGraphQL: '{\n name\n login\n tel\n location\n department\n}',
fieldName: 'requester',
parentTypeName: 'Ticket',
variables: {}
},
stash: {}
}
The function can execute any business logic, and it must return an array of results in the same order as the array of context
objects. The following Lambda function code fetches data from an RDS MySql database via an Amazon RDS proxy. With a batch of requests to fulfill, the Lambda function can optimize its queries. Here, instead of doing individual queries for each context.source.requesterId
, the functions builds the unique set of IDs and executes one query. For example:
The Lambda function code for our resolver is as follows:
var aws_sdk = require('aws-sdk')
var mysql = require('mysql2')
var { RDS_PROXY_URL, DATABASE, USERNAME, REGION } = process.env
var signer = new aws_sdk.RDS.Signer({
region: REGION,
port: 3306,
username: USERNAME,
hostname: RDS_PROXY_URL,
})
var initConn = () => {
const connectionConfig = {
host: RDS_PROXY_URL,
database: DATABASE,
user: USERNAME,
ssl: 'Amazon RDS',
authPlugins: { mysql_clear_password: () => () => signer.getAuthToken({}) },
}
return mysql.createConnection(connectionConfig)
}
var connection = initConn()
exports.handler = async (event) => {
try {
const ids = event.map((context) => {
// fetch the fieldName we are resolving from the `info` object.
// then grab the corresponding ID from the source
const key = context.info.fieldName + 'Id'
return context.source[key]
})
const idSet = `(${[...new Set(ids)].map((id) => `'${id}'`).join(',')})`
const sql = `SELECT * from Users where login in ${idSet}`
return await new Promise((resolve, reject) => {
connection.query(sql, (err, results, fields) => {
// reduce the list of results to a map using the login at a key
const resultMap = results.reduce((prev, curr) => {
prev[curr.login] = curr
return prev
}, {})
// for each id requested, find the related result in the map
const res = ids.map((id) => ({ data: resultMap[id] }))
// return the response
resolve(res)
})
})
} catch (error) {
console.log(error)
}
}
To return the results in the same order as the array of contexts, the function maps the original input to a list of results:
You can return individual errors for each request by including an errorMessage
and errorType
in your result object.
Conclusion
Now, you can configure the batch size for Lambda resolvers in AppSync, thereby addressing the N+1 problem, and optimizing how often your Lambda resolvers are invoked. In turn, this can help improve performance and manage costs when accessing your data with AppSync using Lambda functions. Note that, while you can batch up to 2000 items together, payload size limits still apply. You are responsible for making sure that your requests and responses do not exceed the service payload size limits. See Lambda quotas, and AppSync endpoints and quotas for more details. Visit the AWS AppSync documentation for more information regarding how to configure your Lambda functions to use with batching.