Getting more visibility into GraphQL performance with AWS AppSync logs

Written by Shankar Raju, SDE at AWS & Nader Dabit, Sr. Developer Advocate at AWS.

September 14, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.

Today, we are happy to announce that AWS AppSync now enables you to better understand the performance of your GraphQL requests and usage characteristics of your schema fields. You can easily identify resolvers with large latencies that may be the root cause of a performance issue. You can also identify the most and least frequently used fields in your schema and assess the impact of removing GraphQL fields. Offering support for these capabilities has been one of the top feature requests by our customers.

AWS AppSync is a managed GraphQL service that simplifies application development by letting you create a flexible API to securely access, manipulate, and combine data from one or more data sources. AWS AppSync now emits log events in a fully structured JSON format. This enables seamless integration with log analytics services such as Amazon CloudWatch Logs Insights and Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) (Amazon ES), and other log analytics solutions.

We have also added new fields to log events to increase your visibility into the performance and health of your GraphQL operations:

To search and analyze one or more log types, GraphQL requests, and multiple GraphQL API actions, we added new log fields (logType, requestId, and graphQLAPIId) to every log event that AWS AppSync emits.
To quickly identify errors and performance bottlenecks, we added new log fields to the existing request-level logs. These log fields contain information about the HTTP response status code (HTTPStatusResponseCode) and latency of a GraphQL request (latency).
To uniquely identify and run queries against any field in your GraphQL schema, we added new log fields to the existing field-level logs. These log fields contain information about the parent (parentType) and name (fieldName) of a GraphQL field.
To gain visibility into the time taken to resolve a GraphQL field, we also included the resolver ARN (resolverARN) in the tracing information of GraphQL fields in the field-level logs.

In this post, we show how you can get more visibility into the performance and health of your GraphQL operations using CloudWatch Logs Insights and Amazon ES. As a prerequisite, you must first enable field-level logging for your GraphQL API so that AWS AppSync can emit logs to CloudWatch Logs.

Analyzing your logs with CloudWatch Logs Insights

You can analyze your AWS AppSync logs with CloudWatch Logs Insights to identify performance bottlenecks and the root cause of operational issues. For example, you can find the resolvers with the maximum latency, the most (or least) frequently invoked resolvers, and the resolvers with the most errors.

There is no setup required to get started with CloudWatch Logs Insights. This is because AWS AppSync automatically emits logs into CloudWatch Logs when you enable field-level logging on your GraphQL API.

The following are examples of queries that you can run to get actionable insights into the performance and health of your GraphQL operations. For your convenience, we added these examples as sample queries in the CloudWatch Logs Insights console.

In the CloudWatch console, choose Logs, Insights, select the AWS AppSync log group for your GraphQL API, and then choose Sample queries, AWS AppSync queries.

Find top 10 GraphQL requests with maximum latency

fields requestId, latency
| filter logType = "RequestSummary"
| limit 10
| sort latency desc

Find top 10 resolvers with maximum latency

fields resolverArn, duration
| filter logType = "Tracing"
| limit 10
| sort duration desc

Find the most frequently invoked resolvers

fields ispresent(resolverArn) as isRes
| stats count() as invocationCount by resolverArn
| filter isRes and logType "Tracing"
| limit 10
| sort invocationCount desc

Find resolvers with most errors in mapping templates

fields ispresent(resolverArn) as isRes
| stats count() as errorCount by resolverArn, logType
| filter isRes and (logType = "RequestMapping" or logType = "ResponseMapping") and fieldInError
| limit 10
| sort errorCount desc

The results of CloudWatch Logs Insights queries can be exported to CloudWatch dashboards. We added a CloudWatch dashboard template for AWS AppSync logs in the AWS Samples GitHub repository. You can import this template into CloudWatch dashboards to have continuous visibility into your GraphQL operations.

Analyzing your logs with Amazon ES

You can search, analyze, and visualize your AWS AppSync logs with Amazon ES to identify performance bottlenecks and root cause of operational issues. Not only can you identify resolvers with the maximum latency and errors, but you can also use Kibana to create dashboards with powerful visualizations. Kibana is an open source, data visualization and exploration tool available in Amazon ES.

To get started with Amazon ES:

Create an Amazon ES cluster, if you don’t have one already.
In the CloudWatch Logs console, select the log group for your GraphQL API.
Choose Actions, Stream to Amazon OpenSearch Service and select the Amazon ES cluster to which to stream your logs. You can also use a log filter pattern to stream a specific set of logs. The following example is the log filter pattern for streaming log events containing information about the request summary, tracing, and GraphQL execution summary for AWS AppSync logs.

{ ($.logType = "Tracing") || ($.logType = "RequestSummary") || ($.logType = "ExecutionSummary") }

You can create Kibana dashboards to help you identify performance bottlenecks and enable you to continuously monitor your GraphQL operations. For example, to debug a performance issue, start by visualizing the P90 latencies of your GraphQL requests and then drill into individual resolver latencies.

To build a Kibana dashboard containing these visualizations, use the following steps:

1. Launch Kibana and choose Dashboard, Create new dashboard.

2. Choose Add. For Visualization type, choose Line.

3. For the filter pattern to search Amazon OpenSearch Service indexes, use cwl*. Amazon OpenSearch Service indexes logs streamed from CloudWatch Logs (including AWS AppSync logs) with a prefix of “cwl-”. To differentiate AWS AppSync logs from other CloudWatch logs sent to Amazon ES, we recommend adding an additional filter expression of graphQLAPIID.keyword=<AWS AppSync GraphQL API ID> to your search.

4. To get GraphQL request data from AWS AppSync logs, choose Add Filter and use the filter expression logType.keyword=RequestSummary.

5. Choose Metrics, Y-Axis. For Aggregation, choose Percentile; for Field, choose latency, and for Percents, enter a value of 90. This enables you to view GraphQL request latencies on the Y axis.

6. Choose Buckets, X-Axis. For Aggregation, choose Date Histogram; for Field, choose @timestamp; and for Interval, choose Minute. This enables you to view GraphQL request latencies aggregated in 1-minute intervals. You can change the aggregation internal to view latencies aggregated at a coarser or finer grained time interval to match your data density.

7. Save your widget and add it to the Kibana dashboard, as shown below:

8. To build a widget that visualizes the P90 latency of each resolver, repeat steps 1, 2, 3, and 4 earlier. For step 4, use a filter expression of logType.keyword=Tracing to get resolver latencies from AWS AppSync Logs.

9. Repeat step 5 using duration as the Field value and then repeat step 6.

10. Choose Add sub-buckets, Split Series. For Sub Aggregation, use Terms and for Field, choose resolverArn.keyword. This enables you to visualize the latencies of individual resolvers.

11. Save your widget and add it to the Kibana dashboard, as shown below:

Here’s a Kibana dashboard containing widgets for the P90 request latencies and individual resolver latencies:

Availability

The new logging capabilities are available in the following AWS Regions and you can start analyzing your logs today:

US East (N. Virginia)
US East (Ohio)
US West (Oregon)
Europe (Ireland)
Europe (Frankfurt)
Europe (London)
Asia Pacific (Tokyo)
Asia Pacific (Mumbai)
Asia Pacific (Seoul)
Asia Pacific (Sydney)
Asia Pacific (Singapore)

Log events emitted on May 8, 2019, or later use the new logging format. To analyze GraphQL requests before May 8, 2019, you can migrate older logs to the new format using a script available in the GitHub sample.

Front-End Web & Mobile

Getting more visibility into GraphQL performance with AWS AppSync logs

Analyzing your logs with CloudWatch Logs Insights

Analyzing your logs with Amazon ES

Availability

Resources

Follow