AWS Storage Blog

Achieving consistent time to first byte latencies with Amazon S3 on Outposts

With the rise in data sovereignty and privacy regulations, organizations are seeking flexible solutions that balance compliance with data sovereignty regulations and the agility of the cloud. For example, to comply with data sovereignty regulations, users in the financial and healthcare industries need to deploy applications on premises and store data locally. To provide the best user experience, these organizations need a consistent latency experience across all locations so that globally deployed applications perform predictably everywhere.

AWS Outposts provide a seamless hybrid solution by extending AWS capabilities to any on-premises or edge location, helping you meet data sovereignty regulations. S3 on Outposts offers Amazon Simple Storage Service (Amazon S3) API consistent object storage locally in order to meet data residency and regulatory requirements. Announced on AWS Pi Day 2024, S3 on Outposts now caches AWS Identity and Access Management (IAM) permissions locally, improving the performance of applications running on Outposts. This local cache eliminates the variability in first-byte latencies that come from authentication and authorization operations in the parent AWS Region, which improves the performance of your object API requests.

In this post, we show you how to configure local caching of AWS IAM permissions on S3 on Outposts, validate that caching is active, and then measure the resulting latency improvements. Local caching improves performance of S3 on Outposts workloads by reducing latencies and workload execution time, and consistent latencies allow you to provide a consistent and predictable experience to end users across any location where Outposts racks are deployed. With local caching, you can accelerate time to market for applications using S3 on Outposts without having to design and test for the variability in latency between the Outposts and the parent AWS Region.

Solution overview: Local caching of AWS IAM permissions on S3 on Outposts

After you make an API request to S3 on Outposts, authentication and authorization data for S3 on Outposts is securely cached locally on the Outpost. Your subsequent S3 object API requests are authenticated and authorized using cached data, eliminating the latency incurred from a round trip to the Region. The cache is valid for up to ten minutes when the Outpost is connected to the Region, and it is refreshed asynchronously when you make an S3 on Outposts API request, so that the latest policies are used. S3 on Outposts only caches authentication and authorization data when the request is signed using Signature Version 4A (SigV4A).

Prerequisites

To deploy this solution you need an Outposts rack with S3 on Outposts capacity that is connected to a chosen Region. You must create a Amazon Virtual Private Cloud (Amazon VPC), an S3 on Outposts endpoint, an S3 on Outposts bucket, and an S3 on Outposts access point. You must also use the latest version of the AWS boto3 SDK and install the AWS Common Runtime (AWS CRT) libraries to be able to sign requests with the SigV4A algorithm.

Walkthrough

For this walkthrough, we use the Boto3 Python SDK to make requests to an S3 on Outposts bucket and compare the performance with and without the authentication and authorization cache (auth cache). We show you how to conduct performance benchmarking for S3 on Outposts, and show you the performance benefits of using the S3 on Outposts auth cache. You can leverage the sample code to quantify performance improvements within your S3 on Outposts workloads. The steps are as follows:

  1. Configure required infrastructure
  2. Configure clients
  3. Write performance benchmarking python script
  4. Run the script with the local cache
  5. Run the script without the local cache

Step 1: Configure required infrastructure

To get started with the infrastructure configuration needed for performance benchmarking for S3 on Outposts, create an S3 on Outposts bucket and access point on Outposts. Then, launch an EC2 instance on the Outposts rack. We use an m5.large instance on an Outpost with Amazon S3 capacity in Boston connected to the us-east-1 Region for this walkthrough, but you can use any instance that is available on your Outposts rack.

Step 2: Configure clients

Install the latest version of AWS Command Line Interface (AWS CLI). Validate that you can access your S3 on Outposts bucket by running the following AWS CLI command:

aws s3api list-objects-v2 --bucket <s3-outpost-accesspoint-arn>

Install the latest version of boto3 on your Amazon Elastic Compute Cloud (Amazon EC2) instance, such as the latest version of the AWS CRT:

pip install boto3[crt]

If boto3 is already installed, then you can upgrade to the latest version by running:

pip install boto3 –upgrade

By default, the latest version of boto3 with AWS CRT uses SigV4A to sign all API requests. If the AWS CRT is not installed, then boto3 falls back to using Signature Version 4 (SigV4) to sign API requests.

Step 3: Write performance benchmarking python script

Next, we walk through a script that makes object API requests to S3 on Outpost, measures the latency of each request, and publishes metrics to Amazon CloudWatch.

First, we create new Amazon S3 and CloudWatch clients:

import boto3
from botocore.client import Config
import time
from datetime import datetime
import threading

session = boto3.session.Session()
s3_client = session.client(service_name="s3")
cw_client = boto3.client("cloudwatch")

Next, we define two methods that are used to execute HEAD and PUT object requests to the S3 on Outposts bucket:

# Parameters:
#  - bucketId: Access Point Alias or Amazon Resource Name (ARN) of the S3 Outposts bucket
#  - key: Key name of the object 
#   
# Returns:
#   - A dict containing the CloudWatch metrics data point for the object API latency

def head_object(bucketId, key):
    now_time = datetime.now()
    start_time = time.perf_counter_ns()
    s3_client.head_object(Bucket=bucketId, Key=key)
    end_time = time.perf_counter_ns()
    millisecond_latency = (end_time - start_time) / 1000000
    return {
        "MetricName": "S3RequestLatency",
        "Timestamp": now_time,
        "Value": millisecond_latency,
        "Unit": "Milliseconds",
    }
    
def put_object(bucketId, key):
    data = b"*" * 4096  # 4KB of data
    now_time = datetime.now()
    start_time = time.perf_counter_ns()
    s3_client.put_object(Bucket=bucketId, Key=key, Body=data)
    end_time = time.perf_counter_ns()
    millisecond_latency = (end_time - start_time) / 1000000
    return {
        "MetricName": "S3RequestLatency",
        "Timestamp": now_time,
        "Value": millisecond_latency,
        "Unit": "Milliseconds",
    }

Next, we create a method to execute PUT and HEAD requests to the S3 on Outposts bucket, and publish the latency metrics to CloudWatch:

def tread_worker(jobId, requestCount):
    request_metrics = []
    bucket_name = "<S3-Outpost-Accesspoint>"
    # Make put requests
    for x in range(requestCount):
        request_metrics.append(put_object(bucket_name, f"{jobId}/{x}"))
        request_metrics.append(head_object(bucket_name, f"{jobId}/{x}"))
    limit = min(len(request_metrics), 1000)
    for i in range(0, len(request_metrics), 1000):
        cw_client.put_metric_data(
            Namespace="Sigv4aTest", MetricData=request_metrics[i : i + limit]
        )

def send_requests(threadCount, requestCount): 
    threads = [] 
    for x in range(threadCount): 
       thread = threading.Thread(target=tread_worker, args=(f"job{x}", requestCount)) 
       threads.append(thread) 
       thread.start() 
   for thread in threads: 
       thread.join()

Next, we can configure the test parameters and start the workload. THREAD_COUNT count determines how many threads to start, and REQUEST_COUNT determines how many requests to make. We have chosen to start five threads, with each thread making 10K PUT and 10K HEAD request.

THREAD_COUNT = 5 
REQUEST_COUNT = 10000
send_requests(THREAD_COUNT, REQUEST_COUNT)

Step 4: Run the script with the local cache

With this script, we can now compare the performance of S3 on Outposts with and without the authorization and authentication cache. First, we run the script with SigV4A. We use the time command to measure how long the script takes to execute.

time python3 s3_workload.py

The script generates CloudWatch metrics under the “Sigv4aTest” namespace. You can navigate to the CloudWatch console and go to: Sigv4aTest → Metrics with no dimensions → S3RequestLatency. You can use CloudWatch features to graph statistics and visualize the data.

Step 5: Run the script without the local cache

To run the script without the S3 on Outposts cache, we have to explicitly configure Amazon S3 to use SigV4.

s3_client = session.client(service_name="s3", config=Config(signature_version='v4'))

Run the script again with the following command:

time python3 s3_workload.py

Validating that cache is active

You can use AWS CloudTrail to verify that the S3 on Outposts auth cache is working by validating that API requests are signed with SigV4A. You can enable CloudTrail logs for S3 on Outposts on your bucket, and access the additionalEventData of the CloudTrail log to validate the signing algorithm of the request. Requests signed with SigV4A have SignatureVersion set to AWS4-ECDSA-P256-SHA256. Requests signed with SigV4 have SignatureVersion set to AWS4-HMAC-SHA256.

For example, the following CloudTrail log shows that the PUT request was signed with SigV4A:

{
    "eventVersion": "1.09",
    "userIdentity": {},
    "eventTime": "2024-02-12T23:58:48Z",
    "eventSource": "s3-outposts.amazonaws.com",
    "eventName": "PutObject",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "",
    "userAgent": "",
    "requestParameters": { },
    "responseElements": { },
    "additionalEventData": {
        "CipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "bytesTransferredIn": 4096,
        "x-amz-id-2": "",
        "SignatureVersion": "AWS4-ECDSA-P256-SHA256",
        "bytesTransferredOut": 0,
        "AuthenticationMethod": "AuthHeader"
    },
    "requestID": "",
    "eventID": "",
    "readOnly": false,
    "resources": [],
    "eventType": "AwsApiCall",
    "managementEvent": false,
    "recipientAccountId": "",
    "edgeDeviceDetails": {},
    "eventCategory": "Data"
}

Results

Once the scripts have completed, we can compare the latency of S3 on Outposts with and without the S3 on Outposts auth cache. The following graph shows the latency profile when making 50,000 PUT and HEAD requests to S3 on Outposts, with and without the cache. The red section of the graph shows the P25, P50, P75, and P99 latency metrics for S3 on Outposts requests signed with SigV4, without the auth cache. The green section of the graph shows the same latency metrics for requests signed with SigV4A, where authentication and authorization data is cached.

Figure 1. P25, P50, P75, and P99 latencies for S3 on Outposts with (green section) and without (red section) the cache

Figure 1: P25, P50, P75, and P99 latencies for S3 on Outposts with (green section) and without (red section) the cache

As shown in the preceding Figure 1, the auth cache significantly reduces S3 on Outposts latencies by eliminating the round trip latency incurred by authentication and authorization in the Region. For example, P99 latencies are significantly reduced from over 100 ms to under 50 ms when using the cache.

We zoom in on the P25, P50, and P75 latencies in Figure 2. There is significant reduction in latency, as well as a consistent latency response. For example, P50 latencies drop from ~50 ms to ~20 ms when using the auth cache, a 60% reduction.

Figure 2. P25, P50, and P75 latencies for S3 on Outposts with (green section) and without (red section) the cache

Figure 2: P25, P50, and P75 latencies for S3 on Outposts with (green section) and without (red section) the cache

Lastly, we see a 40% reduction in the total execution time for the workloads when using the auth cache. Without the cache, the workload took 12 minutes to complete, whereas the cache decreased the workload time to seven minutes.

Cleaning up

Once you are done, and to avoid incurring further charges, you can delete all S3 on Outposts resources and objects that you created. You can learn more about deleting S3 on Outposts objects in the Amazon S3 User Guide. You should also terminate the EC2 instance that was used to run the scripts.

Conclusion

In this post, we showed you how the S3 on Outposts local cache improves performance of applications with S3 on Outposts by reducing latencies and workload execution time. We showed you how to configure local caching of AWS IAM permissions on S3 on Outposts, validate that caching is active, and then measure the resulting latency improvements.

With local caching, you can accelerate time to market for applications using S3 on Outposts without having to design and test for the variability in latency between the Outposts and the parent AWS Region. In our sample workload, the local cache reduced P50 latencies for S3 on Outposts by 50%, which resulted in a 40% reduction in total execution time. Consistent latencies allow you to provide a consistent and predictable experience to end users across any location where Outposts racks are deployed.

To take advantage of these benefits for your users, we encourage you to onboard your existing applications using S3 on Outposts to the local cache and measure the performance improvements. For more information, visit the S3 on Outposts documentation.

Boris Alexandrov

Boris Alexandrov

Boris is a Senior Product Manager on the Amazon S3 team. He is passionate about helping customers build applications with S3. In his spare time, he enjoys traveling, dining, cooking, and soccer.

Keerthi Bala

Keerthi Bala

Keerthi is a Senior Software Engineer at AWS based out of Boston with over 15+ years of experience building distributed systems. Her cloud journey started 7 years ago with working on AWS DataSync service. She is now leading S3 on Outposts and S3 compatible storage for AWS Snowball Edge devices.

Jay Patel

Jay Patel

Jay is a software developer on the S3 on Outpost team at AWS, working out of the Boston office. An outdoor enthusiast, he enjoys hiking, snowboarding, and other adventurous activities in his free time.