AWS Storage Blog

Using presigned URLs to identify per-requester usage of Amazon S3

Many software-as-a-service (SaaS) product offerings have a pay-as-you-go pricing model, charging customers only for the resources consumed. However, a pay-as-you-go pricing is only viable when you can accurately track each customer’s use of resources, such as compute capacity, storage, and networking bandwidth. Without this data, SaaS providers do not have visibility into resource consumption of each customer, and are unable to bill based on the their usage. This is just one example where the ability to accurately track consumption of similar resources can be a critical need.

In this blog post, we walk through how to track who has downloaded objects from your Amazon Simple Storage Service (S3) buckets. Specifically, we track the downloads using presigned URLs. We do this by generating a presigned URL with a custom parameter using the Signature Version 4 (SigV4) process and then querying the S3 server access logs with Amazon Athena to identify who made the requests. By including a custom parameter in the presigned URL, S3 bucket owners can track its usage by using the custom parameter as an identifier. This enables the bucket owners to charge its users based on how frequently and how much data that user have downloaded from their environments.

Diagram of how presigned url can help identify access

Figure 1: Diagram depicting how custom parameter can be used to identify access

Required permissions to create a presigned URL

When creating presigned URLs, it is important to keep least privilege in mind. The person with a valid presigned URL can access objects as if they are the original signing user, which makes it important to lock down the permissions of the entity creating presigned URLs. Creating a presigned URL for an S3 object requires the creating user to have explicit permission to perform the specified action. For instance, to create a presigned URL for a GET request, the signing user would need permissions to read the S3 object. With this in mind, we recommend practicing least privilege and granting only the necessary permissions to the signer to limit access to the desired level.

Creating a presigned URL with a custom parameter

In this section, we will demonstrate how to generate a presigned URL with a custom parameter for an object in a private S3 bucket. To do this, you have to write code that signs your request using the SigV4 process. This step is needed to provide authentication information in your request. We will use a Java code sample provided in the Amazon S3 API Reference documentation as a baseline for the signature calculation process and make minor edits to the code to generate the presigned URL with custom parameters.

Complete the following steps to download and navigate to the provided code sample.

  1. Download the Java sample code.
  2. Extract the .zip file.
  3. Navigate to com/amazonaws/services/s3/sample.

Once you have reached the folder with the code, you only need to make edits to three files: RunAllSamples.java and PresignedUrlSample.java in the sample folder and AWS4SignerForQueryParameterAuth.java in the sample/auth folder.

To give you a quick overview of the files:

  • RunAllSamples.java: This file simply runs the four sample code snippets provided in the documentation.
  • PresignedUrlSample.java: This file is responsible for collecting the relevant parameters, computing the signature using the parameters, and generating the final presigned URL. The file is currently hardcoded to generate the presigned URL for the object ExampleObject.txt. You can later change this object name in the file to fit your needs.
  • AWS4SignerForQueryParameterAuth.java: This file contains code that computes the signature using query string parameters. This is used in PresignedUrlSample.java.

The RunAllSamples.java requires four variables: awsAccessKey, awsSecretKey, bucketName, and regionName to run the script. For demonstration purposes, we leverage the existing code from the documentation, which uses static credentials, but for production implementation, we recommend passing in temporary credentials as a best practice. This can prevent accidentally committing your hardcoded AWS access and secret keys into your code repository.

Once you have filled out the four variables (awsAccessKey, awsSecretKey, bucketName, and regionName) in RunAllSamples.java, compile and run the code following the instructions in the Amazon S3 API Reference documentation. This should run the four code samples, including the code that generates the presigned URL for ExampleObject.txt in your specified S3 bucket.

Screenshot of console output from running the provided java code sample successfully

Figure 2: Output from running the provided code sample successfully

In order to include a custom parameter in the presigned URL, you need to make the following code changes in the Java code sample.

  1. Change lines 30 and 32 in PresignedUrlSample.java from ExampleObject.txt to an object name that exists in your S3 bucket.

Screenshot of code that highlights the object name

Figure 3: Change the object name to an object that exists in your current S3 bucket

  1. Under line 44 in PresignedUrlSample.java, add an additional query parameter. This is going to be our custom parameter that is added to the presigned URL. You can choose to add multiple parameters as well. Here we will add name/johndoe as the sample key-value pair.

Screenshot of code that points to line 115 where the custom parameter is added to the authString

Figure 4: Adding the custom parameter name/johndoe to the list of query parameters

  1. In AWS4SignerForQueryParameterAuth.java, you will need to add another line for the query parameter you added in step 3 to be added to the presigned URL.

Screenshot of code that points to line 115 where the custom parameter is added to the authString

Figure 5: Adding one line of code to include the “name” key and its corresponding value in the auth string

Once you have completed the above steps, you can follow the same instructions in the Amazon S3 API Reference documentation to compile and run the code to generate the presigned URL.

Screenshot of presigned URL generated from running the code sample

Figure 6: The compute presigned URL that includes the customer parameter at the end of the URL

The custom parameter you added earlier name=johndoe is now appended to the end of the presigned URL. You can now download the object with this presigned URL.

Querying S3 server access logs to determine usage patterns

Now that you have generated a presigned URL with a custom parameter, the next step is to track the usage of the URL. But first, you will need to enable S3 server access logs for the S3 bucket. Once you do, you can query the server access logs to identify the requesters. There are many ways to query the server access logs that are stored in your S3 bucket. For instance, you can analyze your server access logs by having AWS Lambda stream your logs to Amazon OpenSearch Service or by using Pandas with AWS SDK for Python. In this blog post, we use Amazon Athena to query the S3 server access logs and identify the requests with the custom parameter. You can follow these steps to create an Athena table. Once that is done, you can start querying your S3 server access logs.

Running sample queries

First, run the following query to see what information is being logged.

SELECT * FROM s3_access_logs_db.mybucket_logs limit 10;

The previous query should return 24 columns, including bucket name, http status, IP address of the requester, and many more. The custom parameter you added in the above exercise will appear under the request_uri. If you were to charge your clients based on the amount of data they have retrieved from your S3 bucket, you could run a query that searches for access logs that only return the custom parameter in the request_uri and bytessent of a successful GET request.

SELECT SPLIT_PART(SPLIT_PART(request_uri,'name=',2),' ',1), bytessent FROM s3_access_logs_db.mybucket_logs WHERE httpstatus='200' AND operation='REST.GET.OBJECT' AND request_uri LIKE '%name=%';

Screensht of Athena console displaying the query result filtered by the custom parameter

Figure 7: Custom parameter can be found in the request_uri column

The previous screenshot shows the client name value johndoe appearing in the S3 server access logs when queried through Athena. Knowing this, you can use the sample code to generate presigned URLs dedicated to each of your clients and keep track of how the presigned URLs are being used.

Cleaning up

The only ongoing charges from this exercise is the S3 storage costs for the files in S3. If you do not want to incur any more charges, you should delete the S3 files that were created for the testing.

Conclusion

In this post, we walked you through a way to identify per-requester usage of Amazon S3. We cover making modifications to an existing code sample to add a custom parameter to the presigned URL and querying S3 server access logs to identify the request using the added custom parameter. With this information, you no longer need to guess the per-customer traffic to your S3 bucket and can offer your customers S3 object access with a pay-as-you-go pricing model. User data access patterns can further be used to infer user needs and can help build new and improved offerings to your customers.

Thanks for reading this blog post! If you have any comments or questions, don’t hesitate to leave them in the comments section.

John Lee

John Lee

John Lee is a Solutions Architect at AWS based in Chicago. In his role, John helps small and midsize enterprises build on AWS. Prior to joining AWS, John worked as a software developer and also spent time in graduate school studying human computer interaction.

Chance Lee

Chance Lee

Chance Lee is a Sr. Container Specialist Solutions Architect at AWS based in the Bay Area. He helps customers architect highly scalable and secure container workloads with AWS container services and various ecosystem solutions. Prior to joining AWS, Chance was an IBM Lab Services consultant.

Justin Lim

Justin Lim

Justin Lim is a WW GTM Specialist for Aurora MySQL at AWS based in Seattle. In his role, Justin focuses on helping startups accelerate their workloads using AWS native databases. Prior to joining AWS, Justin worked as an engineer and graduated from the University of Washington with his master's in Electrical Engineering.