Networking & Content Delivery

How to identify website performance bottlenecks by measuring time to first byte latency and using Server-Timing header

While website performance issues are a common occurrence, pinpointing their root causes can be a challenging task. In this post, you will learn how to simplify the performance troubleshooting process by unlocking the potential of the Server-Timing header. This header allows backend components to communicate timing metrics and other insights relevant to performance monitoring in response to user requests.

Website visits involve complex server-side processes, including content optimization, such as image transformation, and dynamic data retrieval from databases. Identifying the precise server and process responsible for a slow request often requires correlating and analyzing various logs, which can be time-consuming. Simplifying this process would expedite issue resolution, which you can achieve by establishing a direct link between user experience quality signals and performance server-side indicators encapsulated within a single log line. This eliminates the need for extensive data querying and correlation and enables you to quickly identify and trace performance issues to specific server components. An example of this approach is Common Media Client Data (CMCD), a recent innovation from the video streaming industry that seamlessly integrates both client and server observability data in the same request log line. Websites can adopt similar principles by implementing the Server-Timing header, effectively merging server-side metrics with those available on the client side, thereby providing a comprehensive view of a given request-response cycle’s performance.

The solution we propose consists of two parts: first, identifying performance issues by measuring end user latency, and second, immediately accessing server insights for such cases. We will first address the former before delving into the implementation of Server-Timing.

Detecting performance issues

Website performance largely depends on latency. Latency refers to the time delay between a user’s action (such as clicking a link or submitting a form) and the response from the server. For websites, latency is typically measured in the form of time to first byte (TTFB), also known as first byte latency (FBL). It measures how quickly a website’s content begins to render on the user’s screen, directly impacting Core Web Vitals signals such as first contentful paint (FCP) and largest contentful paint (LCP). To ensure a seamless user experience, it’s recommended to maintain a TTFB of 800 milliseconds or less. This benchmark serves as a useful threshold for identifying slow requests. Leveraging services like Amazon CloudFront can aid in enhancing TTFB for both static and dynamic content.

When assessing TTFB from the client-side perspective, it covers the duration from the user’s request initiation to the reception of the first byte of response from the server. This calculation includes network transmission time and all processing time on the server side, which may involve content delivery network (CDN) operations, origin web server functions, database queries, and other request processing tasks depending on the website architecture. When measured on the server-side, TTFB reflects the interval between the server receiving the request and dispatching the first byte of the response to the network layer. Here, network transmission time is not included, and TTFB essentially denotes the server’s processing time before initiating the response. Furthermore, in scenarios where servers are positioned in the midst of the request flow, they serve dual roles: as servers when receiving requests from downstream sources and as clients when forwarding requests upstream to other servers. This operational model is common for servers within CDNs such as Amazon CloudFront, leading to the presence of both client-side and server-side TTFB metrics for such servers.

For a typical website architecture involving components like CloudFront with edge functions, Application Load Balancers, web servers, and databases, the complete request-response cycle unfolds as depicted in Figure 1.

Figure 1. Request-response cycle timings in the classic website architecture

Figure 1. Request-response cycle timings in the classic website architecture

In the Figure 1, we marked various timestamps of request or response initiation and termination as T. Using these timestamps, we can calculate various TTFBs as follows:

  • User TTFB is an interval from T1 to T18 time points and should be measured when monitoring user experience and identifying issues when it exceeds the recommended value. A shorter user TTFB indicates faster response delivery and better user experience.
  • CloudFront downstream TTFB is an interval from T2 to T17. In the case of a cache hit, where the request is served from the CloudFront cache without involving any processing on the origin side, TTFB solely indicates the time CloudFront took to process the request and prepare the response, including the execution duration of an edge function if it is used. However, in a cache miss scenario, it includes the time taken by the origin to process the request and prepare the response, along with the time for transferring the response from origin to CloudFront.
  • CloudFront upstream TTFB is an interval from T3 to T14. This represents a cache miss case where CloudFront sends the request to origin and receives a response from it.
  • Similar to CloudFront, all servers in the origin solution also possess their own TTFBs. For example, when executing a database query to populate an HTML page, you can measure both database processing time and transmission time as the interval from T7 to T10
  •  The transmission time from the upstream component to the downstream can be estimated as the downstream TTFB minus the upstream TTFB. For example, the transmission time of the first byte from CloudFront to the user is calculated as user TTFB minus CloudFront downstream TTFB. A shorter transmission time indicates better network conditions and shorter distance.

You can measure the user TTFB in the browser with JavaScript using the Resource Timing API. This API allows you to obtain timestamps for various stages involved in loading a resource, including the start time of the request, DNS resolution time, TCP and TLS handshakes, and the reception of the first byte of the response. This facilitates the calculation of TTFB, along with other useful timings associated with loading the resource.

const timings = {};

new PerformanceObserver((entryList) => {
  const entries = entryList.getEntries();
  
  entries.forEach(entry => {
    if (entry.responseStart > 0) {
      timings.userDNS = (entry.domainLookupEnd - entry.domainLookupStart).toFixed(2);
      timings.userTCP = (entry.connectEnd - entry.connectStart).toFixed(2);
      timings.userTLS = (entry.requestStart - entry.secureConnectionStart).toFixed(2);
      timings.userTTFB = (entry.responseStart - entry.requestStart).toFixed(2);
    }
  });
}).observe({
  type: 'resource',
  buffered: true
});

This code snippet retrieves DNS, TCP, TLS, and TTFB timings for each resource loaded from a webpage. Similarly, you can obtain these timings for navigation requests in the browser using the Navigation Timing API. With this data, you can not only determine if the latency is acceptable but also analyze the durations of DNS, TCP, and TLS stages of the request. These metrics are influenced by the network distance packets need to travel between the user and the front-end server, as well as the status of network congestion. If their values are high, nearing the benchmark of 800 milliseconds, it signals the need to improve network conditions for a smoother user experience. CloudFront, by terminating requests closer to the user at its edge locations, can significantly enhance network performance.

However, if a performance issue is caused by the server side, we lack visibility into this aspect from this data. This is where Server-Timing comes in handy.

Implementing Server-Timing

Any web server can include a Server-Timing header in its HTTP response, providing server metrics. This header is supported by all modern browsers, allowing for easy parsing and retrieval of metric values using the PerformanceServerTiming interface. CloudFront already supports Server-Timing, communicating metrics related to its processing. For example, the CloudFront downstream TTFB we described earlier is the cdn-downstream-fbl metric, and the CloudFront upstream TTFB is cdn-upstream-fbl. You can find other available metrics and their description in the developer guide.

To activate Server-Timing in CloudFront, you must create a response headers policy. To do so, toggle the Enable option in the Server-Timing header panel and specify a sampling rate. You can also configure the addition or removal of other response headers as needed. CloudFront’s Server-Timing feature enables you to assess whether CloudFront processes requests fast enough. In the case of a cache hit, the value of the cdn-downstream-fbl metric should be low, indicating rapid response initiation by CloudFront. Conversely, a high value for this metric suggests slow processing and indicates an issue on the CloudFront side. In the case of a cache miss, you can also evaluate the connection time values from CloudFront to the origin using cdn-upstream-connect and cdn-upstream-dns metrics. Low values of these metrics suggest that the next server in the request flow (such as the Application Load Balancer depicted in Figure 1) is in good operational health, establishing connections quickly and being located in close proximity to CloudFront origin-facing servers. Most of the time, these values will be 0, indicating that CloudFront is reusing a previously established connection using its persistent connections feature. The cdn-upstream-fbl metric indicates how quickly the first byte of the response from the origin arrives at CloudFront. High values of this metric, along with low values of cdn-upstream-connect and cdn-upstream-dns, indicate that the origin solution behind the Application Load Balancer is experiencing issues that prevent it from providing a rapid response. Ideally, all these metrics shouldn’t contribute significantly to the latency experienced by the user (user TTFB).

CloudFront’s Server-Timing header provides insights into performance on the CDN side in both directions. However, it doesn’t directly tell what happened to the request on the origin side. Considering the diversity of modern origin architecture, consisting of multiple different components and technologies, it’s essential to incorporate their performance timing information for a comprehensive understanding. To extract insights from origin servers, you can create your own Server-Timing header and include it in the response to CloudFront. CloudFront doesn’t replace this header. Instead, it appends its metrics to the Server-Timing header received from the origin. For the metrics you include in your own Server-Timing header, you can measure timings for critical backend processes such as image optimization, API calls, database queries, or edge computing. For instance, if you’re using PHP to execute a database query, you can measure the duration of the query as follows:

$dbReadStartTime = hrtime(true);
// Database query goes here
$dbReadEndTime = hrtime(true);
$dbReadTotalTime = ($dbReadEndTime - $dbReadStartTime) / 1000000;
header('Server-Timing: my-query;dur=' . $dbReadTotalTime);

This code snippet captures the time it took to complete the database operations and communicates it as the my-query metric in the Server-Timing header. Given that databases can sometimes become overloaded and serve as a performance bottleneck, this data aids in uncovering such scenarios.

If you are using Node.js, you can implement the Server-Timing header using an example from the PerformanceServerTiming interface specification.

Similarly, you can measure the latency added by your edge function, which is particularly beneficial for complex Lambda@Edge functions that perform network calls. Here is an example of Server-Timing header implementation for a Lambda@Edge function attached to the origin response event:

import json
import time

# CF headers are available in request object for Lambda@Edge functions attached to origin response event only

def lambda_handler(event, context):

    # Get function's start timestamp
    handler_start_time = time.time()

    response = event['Records'][0]['cf']['response']
    request = event['Records'][0]['cf']['request']
    server_timing_value = []

    # List of CloudFront headers to include in server timing for additional inisghts
    cf_headers = ['cloudfront-viewer-country', 'cloudfront-viewer-city',
                  'cloudfront-viewer-asn']

    # Iterate over each header name and construct the value for the Server-Timing header
    for header_name in cf_headers:
        if header_name in request['headers']:
            header_value = request['headers'][header_name][0]['value']
            server_timing_value.append('{}; desc="{}"'.format(header_name, header_value))

    # Function's logic goes here

    # Get function's stop timestamp
    handler_stop_time = time.time()
    handler_duration = round((handler_stop_time - handler_start_time) * 1000, 2)
    server_timing_value.append('{}; dur={}'.format("my-function", handler_duration))

    if server_timing_value:
        # Construct the Server-Timing header
        server_timing = [{
            "key": "Server-Timing",
            "value": ', '.join(server_timing_value)
        }]

        # Add or append the Server-Timing header
        if 'server-timing' in response['headers']:
            response['headers']['server-timing'][0]['value'] += ', ' + ', '.join(server_timing_value)
        else:
            response['headers']['server-timing'] = server_timing

        print("Server-Timing:", response['headers']['server-timing'])

    return response

Notably, the metric added in this code solely represents the duration of the handler code execution, excluding other Lambda timings. We intentionally used vague names for the metrics to avoid revealing the origin architecture because this information could be exploited by malicious actors to launch attacks.

Also note that we enhanced the Server-Timing header with insights about the user’s geolocation and ASN number.

Now, we can enhance our code to retrieve server timings on the client-side as well using the serverTiming property. Below is the modified code snippet:

// Creating a new PerformanceObserver to monitor performance entries
new PerformanceObserver((entryList) => {
  const entries = entryList.getEntries();

  for (const entry of entries) {
    // Object to store timings for various stages
    const timings = {
      userDNS: null,                // User DNS resolution time
      userTCP: null,                // User TCP handshake time
      userTLS: null,                // User TLS handshake time
      CFDNS: null,                  // CDN DNS resolution time
      CFUpstreamHandshake: null,    // CDN upstream TCP handshake time
      MyQuery: null,                // Query time
      CFUpstreamTTFB: null,         // CDN upstream Time To First Byte (TTFB)
      MyFunction: null,             // Function execution time
      CFDownstreamTTFB: null,       // CDN downstream TTFB
      userTTFB: null,               // User Time To First Byte (TTFB)
      CFRID: null,                  // CDN Request ID
      CFCacheStatus: null,          // CDN Cache status (Hit or Miss)
      UserASN: null                // User Autonomous System Number (ASN)
    };

    // Iterating through server timing entries for the current performance entry
    entry.serverTiming.forEach((serverEntry) => {
      switch (serverEntry.name) {
        case 'cdn-rid':
          timings.CFRID = serverEntry.description;
          break;
        case 'cdn-cache-miss':
          timings.CFCacheStatus = "Miss";
          break;
        case 'cdn-cache-hit':
          timings.CFCacheStatus = "Hit";
          break;
        case 'cdn-upstream-connect':
          timings.CFUpstreamHandshake = serverEntry.duration;
          break;
        case 'cdn-downstream-fbl':
          timings.CFDownstreamTTFB = serverEntry.duration;
          break;
        case 'cdn-upstream-dns':
          timings.CFDNS = serverEntry.duration;
          break;
        case 'cdn-upstream-fbl':
          timings.CFUpstreamTTFB = serverEntry.duration;
          break;
        case 'my-query':
          timings.MyQuery = serverEntry.duration;
          break;
        case 'my-function':
          timings.MyFunction = serverEntry.duration;
          break;
        case 'cloudfront-viewer-asn':
          timings.UserASN = serverEntry.description;
          break;
      }
    });

    // Calculating user-specific timings if the response not served from the local cache
    if (entry.responseStart > 0) {
      timings.userDNS = (entry.domainLookupEnd - entry.domainLookupStart).toFixed(2);
      timings.userTCP = (entry.connectEnd - entry.connectStart).toFixed(2);
      timings.userTLS = (entry.requestStart - entry.secureConnectionStart).toFixed(2);
      timings.userTTFB = (entry.responseStart - entry.requestStart).toFixed(2);

      // Logging metrics for the current entry
      console.log("Metrics for:", entry.name);
      console.log("userDNS:", timings.userDNS);
      console.log("userTCP:", timings.userTCP);
      console.log("userTLS:", timings.userTLS);
      console.log("CFDNS:", timings.CFDNS);
      console.log("CFUpstreamHandshake:", timings.CFUpstreamHandshake);
      console.log("DBQuery:", timings.MyQuery);
      console.log("CFUpstreamTTFB:", timings.CFUpstreamTTFB);
      console.log("lambdaEdge:", timings.MyFunction);
      console.log("CFDownstreamTTFB:", timings.CFDownstreamTTFB);
      console.log("userTTFB:", timings.userTTFB);
      console.log("CFRID:", timings.CFRID);
      console.log("CFCacheStatus:", timings.CFCacheStatus);
      console.log("UserASN:", timings.UserASN);
      console.log("------------------------------------------------------");
    }
  }
}).observe({
  type: 'resource',   // Observing resource-related performance entries
  buffered: true
});

This refined code snippet fetches server timings and incorporates them into the timings object alongside client metrics. This enables you to accomodate comprehensive insights into the performance of the request-response cycle from both the client and server sides in one place. Here is an example of console.log output:

Metrics for: https://d1234.cloudfront.net/script.php
userDNS: 0.00
userTCP: 0.00
userTLS: 5.00
CFDNS: 0
CFUpstreamHandshake: 88
DBQuery: 0.538685
CFUpstreamTTFB: 178
lambdaEdge: 0.09
CFDownstreamTTFB: 229
userTTFB: 233.10
CFRID: mRq-Uvr__3OBDo0IX9ELV5Lrk3lF-bOp4eOIqTEXlFkFn0wIWPKgpA== 
CFCacheStatus: Miss
UserASN: 1257

In this example, it took 229 milliseconds for both CloudFront, executing Lambda@Edge function, and the origin server, performing a database query, to send the first byte to the network for transmission (CFDownstreamTTFB). The fisrt byte arrived at the client device after 233 milliseconds (userTTFB), indicating a transmission time of 4 milliseconds. The client device reused previously established TCP and TLS connections (userTCP, userTLS) and had CloudFront’s IP address cached (userDNS). CloudFront set up a new TCP connection toward the origin (CFUpstreamHandshake), which took 88 milliseconds. However, the origin processed the request and returned the first byte of response rapidly, within 90 milliseconds (CFUpstreamTTFB – CFUpstreamHandshake). We can conlcude that the overall latency for the end user in this case was below the recommended value of 800 milliseconds and satisfactory.

Enrich Server-Timing with other data

While the Server-Timing header was designed to communicate the server-side processing times, its syntax doesn’t limit you to using it solely for that purpose. For example, CloudFront includes metrics with cache status and an internal unique request ID. This data is crucial for accurately analyzing data related to CloudFront’s processing. Similarly, you can enrich your own Server-Timing header with metrics that provide additional insight into the request journey. For instance, you can add an internal ID of the server in the cluster to help you locate its logs when needed, as well as geolocation of users and their device types. The latter can be achieved by using CloudFront headers, and we showed earlier how you can use them in your Lambda@Edge function. These headers are available in the request object for the Lambda@Edge function associated with the origin response event or for the CloudFront function associated with the viewer response and viewer request events. You can enable these headers in the origin request policy, making them available for your origin web server in the request from CloudFront. This allows you to incorporate them into the Server-Timing header.

Analyzing results with Amazon CloudWatch

While using Server-Timing headers is useful for identifying performance issues and pinpointing their root cause, it doesn’t offer insights into other crucial aspects of website performance. For example, errors associated with JavaScript execution and certain Web Vitals metrics like cumulative layout shift are not directly captured by this solution. If you’re already utilizing a real user monitoring (RUM) based website monitoring solution, integrating the Server-Timing header can complement existing practices rather than replace them or restrict performance monitoring solely to Server-Timing. One of the examples of comprehensive website monitoring solutions is Amazon CloudWatch RUM.

To enhance CloudWatch RUM insights with the solution described earlier, you can create a custom event to capture metrics extracted from the Server-Timing header and have the CloudWatch RUM client send it to CloudWatch. This approach consolidates all CloudWatch RUM insights alongside Server-Timings within the same service, facilitating seamless analysis of both datasets together.

Here is an example of how you can record a custom event with the data extracted from the Server-Timing header and client-side measurements into CloudWatch using its RUM client for the provided code snippet above:

// Sending performance data to a remote server
      cwr('recordEvent', {
        type: 'my-server-timing',
        data: {
          current_url: entry.name,
          ...timings  // Spread operator to include all timings
        }
      });

In this example, we include all properties and values from the timings object in the data object sent to the cwr function. This means that all timings captured for the specific entry will be sent along with the current_url.

Custom events will be logged into CloudWatch logs, allowing you to query them using CloudWatch Logs Insights. Additionally, you can create CloudWatch metrics using metric filters and set up metric alarms for monitoring purposes.

Here is an example of a custom metric implementation for the timings we are collecting in the code above:

{
  "event_timestamp": 1710929230000,
  "event_type": "my-server-timing",
  "event_id": "9ae82980-4bfb-47f5-8183-b241379e09e1",
  "event_version": "1.0.0",
  "log_stream": "2024-03-20T03",
  "application_id": "c27d1cef-e531-45ad-9bc4-8e03a716c775",
  "application_version": "1.0.0",
  "metadata": {
    "version": "1.0.0",
    "browserLanguage": "en",
    "browserName": "Chrome",
    "browserVersion": "123.0.0.0",
    "osName": "Mac OS",
    "osVersion": "10.15.7",
    "deviceType": "desktop",
    "platformType": "web",
    "pageId": "/",
    "interaction": 0,
    "title": "TTFB Demo",
    "domain": "d1234.cloudfront.net",
    "aws:client": "arw-script",
    "aws:clientVersion": "1.17.0",
    "countryCode": "SE",
    "subdivisionCode": "AB"
  },
  "user_details": {
    "sessionId": "c9d2514a-8884-4b32-aec0-25203f213f84",
    "userId": "0f7f2bf3-c9b7-46ab-bc9e-2ff53864ea74"
  },
  "event_details": {
    "current_url": "https://d1234.cloudfront.net/getmeal.php",
    "userDNS": "0.00",
    "userTCP": "0.00",
    "userTLS": "9.50",
    "CFDNS": 0,
    "CFUpstreamHandshake": 90,
    "MyQuery": 0.517874,
    "CFUpstreamTTFB": 180,
    "MyFunction": 0.12,
    "CFDownstreamTTFB": 233,
    "userTTFB": "239.30",
    "CFRID": "ujYncZYVJeIOk6fI7ApFuNt-mJoh8hfL3nZPgAj77z7RdtSzNMTcqQ==",
    "CFCacheStatus": "Miss",
    "UserASN": "1257"
  }
}

Based on this, we can create the following filter pattern for the metric user TTFB:

{ $.event_details.userTTFB= * && $.event_details.CFCacheStatus= * && $.event_details.UserASN= * && $.metadata.countryCode=*}

It enables you to create a metric for the user TTFB with dimensions such as country code, ASN number, and CloudFront cache status. You can then create an alert for this metric to receive notifications when it exceeds the predefined static threshold, such as the recommended 800 milliseconds, or utilize CloudWatch anomaly detection.

Optimization and cost

It’s important to recognize that Server-Timing headers increase the response size. This can have implications for costs associated with CloudFront data transfer out, as well as for storing and processing the data within your analytics system. For instance, the value of the Server-Timing header provided earlier equals approximately 350 bytes, and for 1 million requests, this would result in an additional 0.325 gigabytes of data transferred out. Depending on the number of requests your website receives, this may or may not constitute a significant cost. However, you can reduce it by including only essential information in the Server-Timing, particularly actionable data. For instance, if you primarily require Server-Timing for detecting performance degradation, you may choose to add it only to requests that exceed the recommended threshold of 800 milliseconds.You can further minimize its usage by applying it solely to the loading of critical resources on your website, such as interactive forms or API calls. You can achieve this by implementing the necessary filters to the respective metrics in your client-side JavaScript code.

Conclusion

In this post, we explored the importance of TTFB in website performance monitoring and demonstrated how the Server-Timing header can be utilized to provide detailed insights into the request-response cycle. By measuring latency and obtaining server-side metrics, such as CloudFront processing times and origin server response durations, website owners can pinpoint the root causes of performance issues and take proactive measures to optimize their websites. To learn more on how to speed up your websites and APIs and keep them protected, visit the Application Security and Performance section of the AWS Developer Center.

About the author

Yury Yakubov yakubovy.jpeg

Yury Yakubov

Yury is an Edge Specialist Solutions Architect. He has over fifteen years of content delivery industry experience, with a focus on performance, security and observability. He is passionate about helping customers build effective solutions for complex tasks. Outside of work, he can be found spending time with his family, reading books or playing old video games.