AWS Storage Blog
Preserving last-modified timestamps when restoring Amazon S3 objects with AWS Backup
Customers operating in highly regulated industries are usually subject to rules mandating that data integrity be maintained and available throughout its entire lifetime. To meet integrity requirements, data must be restorable along with any associated audit trail and metadata information, such as object creation dates, last modified timestamps, and tags.
When restoring backups of Amazon Simple Storage Service (S3) objects created with AWS Backup, the integrity of the original data is guaranteed with a verification of the restored object’s unique hash value. However, the restored objects are treated as new objects, resulting in their last-modified metadata being set to the new creation date. Last-modified is a system-defined object metadata field that can’t be modified back to its original value by a user after restoration. Consequently, the restored object state won’t be identical to the original state captured by AWS Backup.
In this post, we discuss how to use S3 object tags to preserve the last-modified timestamp value of both new and existing objects after backup and restore operations. After implementing this solution, customers can rely on AWS Backup for situations where the last-modified timestamp of the original object must be kept and be available with the newly restored object. This can be especially beneficial for solutions where an inspector or auditor needs to have access to a complete audit trail of every event that affected the object, including the restore for a backup.
Note that as of the writing of this post, Amazon S3 limits the number of tags per-object to 10. So, if you have already reached this limit, then this solution might not be suited for your use-case. Please also refer to this page for the costs of S3 Object Tagging.
Solution walkthrough
This solution leverages S3 object tags to durably store the last-modified timestamp value of the object. In Section 1, we will demonstrate how to automate the creation of an S3 object tag that contains the last-modified timestamp value of every newly created object in a bucket you protect with AWS Backup.
Once the last-modified timestamp is stored automatically as an S3 object tag, we will show how AWS Backup will automatically backup the tag with the object allowing you to preserve it as part of your backups and making it available to be restored with the protected object when needed.
We recommend saving this information as an object tag rather than as the user-defined metadata of the object. This is because when editing the user-defined metadata of an object, Amazon S3 actually creates a new version of the object. This would trigger our Lambda function again, thereby creating an unwanted loop.
Then, in Section 2, we will demonstrate how to automate the creation of an S3 object tag that contains the last-modified timestamp for existing objects in the bucket so that your S3 backups will now automatically contain last-modified timestamp values stored in the backup as S3 object tags for all your protected objects.
As part of the solution, we also cover how to preserve all existing tags in each object when adding a new one. Currently, the API call to add a tag to an object will replace all current tags with the new one. For this reason, we will see in the code how to collect the existing tags that you want to preserve and push them back along with the new one.
Section 1: Adding last-modified timestamp object tags to new and modified objects
We want to add last-modified object tags to newly created Amazon S3 objects and maintain them on existing objects. At a high level, we explain how to configure an Amazon S3 event notification on all create object calls (PUT, POST, COPY, and MultipartUpload) that we use to trigger an AWS Lambda function. Then, we demonstrate how a Lambda function can be used to automatically obtain the last modified data from the object and save it as an object tag that will be backed up and eventually restored with the object by AWS Backup.
The following diagram illustrates the architecture to automatically insert a Tag on the new created/modified object.
Creating an Amazon Lambda function to obtain the last-modified timestamp
Let’s start by creating the Lambda function:
From the Lambda section, head to Create function.
- Chose Author from scratch, give it a Function name, and select Python 3.9 as Runtime.
- Select the Arm64 with Graviton2 will run at lower costs.
- Add the correct permissions (or create a new one). Lambda must have the permissions to read and write object tag on the source S3 bucket.
- Confirm by selecting Create function.
You can now add the following code to the function:
import json
import boto3
import datetime
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
def lambda_handler(event, context):
s3BucketName = event['Records'][0]['s3']['bucket']['name']
s3ObjectKey = event['Records'][0]['s3']['object']['key']
try:
tags = s3_client.get_object_tagging(
Bucket=s3BucketName,
Key=s3ObjectKey,
)
current_tags = tags['TagSet']
except:
current_tags = []
try:
object = s3.Object(s3BucketName,s3ObjectKey)
last_modified_date = object.last_modified.strftime("%Y-%m-%d %X%z")
#Update list with new last modified date value
updated_tags = updateTags(current_tags, 'last-modified-date', last_modified_date)
s3_client.put_object_tagging(
Bucket = s3BucketName,
Key = s3ObjectKey,
Tagging={
'TagSet': updated_tags,
},
)
except Exception as e:
raise e
# Define functions to get and put existing object tags
def keyExists(data, key):
return any(d['Key'] == key for d in data)
def addKey(data, key, value):
data.append({'Key': key, 'Value': value})
return data
def replaceKey(d, key, value):
if d['Key'] == key:
d['Value'] = value
return d
else:
return d
def updateTags(data, key, value):
if(keyExists(data, key)):
return[replaceKey(d, key, value) for d in data]
else:
return addKey(data, key, value)
In the next step, we configure the S3 Event Notification that will trigger the Lambda function every time an object is either created or modified. When these events happen, Amazon S3 will send the event in the JSON format to Lambda where it will be processed by the lambda_handler.
At this point, the function above will do four things:
- Fetch the last modified date (or creation date) of the object.
- Collect all of the tags already present in the object in a Python dictionary.
- Replace (or add, in case it is a new object) the last-modified timestamp to the Python dictionary.
- Associate all of the tags back to the object.
Creating an Amazon S3 data event trigger for AWS Lambda
Now we must create the Amazon S3 data event to trigger this Lambda function.
On the Amazon S3 source bucket:
- Go to Proprieties.
- Event Notification and Create event Notification.
- From there give a Name a Prefix and Suffix if you want to target a specific subset of objects.
- On the Event types select All object create events.
- On the Destination make sure that Lambda function is selected.
- Choose the Lambda function that you created in the previous step.
- You can now select Save Changes.
Each new backup will now have the object tag with the last modified date.
Section 2: Adding last-modified timestamp object tags to existing objects
Now that we have configured AWS Lambda and Amazon S3 data event triggers to preserve the last-modified timestamp for new and modified objects, we can to do the same for our existing objects that we want to protect with AWS Backup.
We need to take this next step because the existing objects in the bucket would not have had an S3 data event that would have triggered our AWS Lambda function to preserve their last-modified timestamp as an S3 object tag.
We recommend running a one-time S3 Batch Operations job to capture the last-modified timestamp of each existing object and save its value to an object tag for the first time. S3 Batch Operations job require an object manifest that list all of the objects you want S3 Batch Operations jobs to take action upon. We can leverage the Amazon S3 Inventory tool to generate the manifest.
In this section, we will walk you through how to:
- Generate a manifest for S3 Batch Operation with the list of existing objects.
- Create an AWS Lambda function similar to the previous one that will add the last-modified timestamp to an object tag.
- Create a Batch Operation job that will run the Lambda function on each object listed in the manifest.
The following is a diagram of the solution:
Inventory
Let’s start with the creation of an S3 Inventory report. The Inventory report will contain a list of all of the objects we intend to run the Lambda function on to capture their existing S3 object tags and their current last-modified timestamp information.
We will then use this Inventor report to create a manifest for S3 Batch Operations to run against and add the new S3 object tag that will contain the last-modified timestamp.
- From the source bucket head to the Management section, scroll down to the Inventory configuration. There, you can create an Inventory Configuration.
- Fill the fields and choose the destination bucket (or you can use a specific prefix on the source bucket). This bucket/prefix will be used to save the manifest file.
- Select Daily for the frequency, although this will be used only once.
- Select CSV for the Output format.
- Select Create.
Once the Inventory is created, it will run on a daily basis and will save the results to a manifest CSV file on the destination chosen. Note that it can take up to 48h for the first manifest to be generated depending on the number of objects in the bucket to inventory. It is possible to setup a notification event for when the inventory is completed.
Creating the Lambda function for the Amazon S3 Batch Operations job to execute
In the final step, we construct the Lambda function that the Amazon S3 Batch Operations job runs on each object.
Code:
import json
import boto3
import datetime
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
def lambda_handler(event, context):
# Parse job parameters from S3 batch operations
jobId = event['job']['id']
invocationId = event['invocationId']
invocationSchemaVersion = event['invocationSchemaVersion']
# Prepare results
results = []
# S3 batch operations currently only passes a single task at a time in the array of tasks.
task = event['tasks'][0]
# Extract the task values we might want to use
taskId = task['taskId']
s3Key = task['s3Key']
s3VersionId = task['s3VersionId']
s3BucketArn = task['s3BucketArn']
s3BucketName = s3BucketArn.split(':')[-1]
#Get current (if any) tags from object. If no tags exist, create an empty list
try:
# If object does not have tags, it will generate an error
# In this case ignore the error and create an empty list
tags = s3_client.get_object_tagging(
Bucket=s3BucketName,
Key=s3Key,
)
current_tags = tags['TagSet']
except:
current_tags = []
try:
# Assume it will succeed for now
resultCode = 'Succeeded'
resultString = 'Last Modified date added to Tag'
object = s3.Object(s3BucketName,s3Key)
last_modified_date = object.last_modified.strftime("%Y-%m-%d %X%z")
#Update list with new last modified date value
updated_tags = updateTags(current_tags, 'last-modified-date', last_modified_date)
s3_client.put_object_tagging(
Bucket = s3BucketName,
Key = s3Key,
Tagging={
'TagSet': updated_tags,
},
)
except Exception as e:
# If we run into any exceptions, fail this task so batch operations does not retry it and
# return the exception string so we can see the failure message in the final report
# created by batch operations.
resultCode = 'PermanentFailure'
resultString = 'Exception: {}'.format(e)
finally:
# Send back the results for this task.
results.append({
'taskId': taskId,
'resultCode': resultCode,
'resultString': resultString
})
return {
'invocationSchemaVersion': invocationSchemaVersion,
'treatMissingKeysAs': 'PermanentFailure',
'invocationId': invocationId,
'results': results
}
# Define functions to get and put existing object tags
def keyExists(data, key):
return any(d['Key'] == key for d in data)
def addKey(data, key, value):
data.append({'Key': key, 'Value': value})
return data
def replaceKey(d, key, value):
if d['Key'] == key:
d['Value'] = value
return d
else:
return d
def updateTags(data, key, value):
if(keyExists(data, key)):
return[replaceKey(d, key, value) for d in data]
else:
return addKey(data, key, value)
Before we create the Amazon S3 Batch Operations job that will execute this Lambda function against each object listed in its manifest, let’s summarize what the function’s code does. The Batch Operations job passes down all of the necessary S3 object information through an event object that is accessible from within the lambda_handler function. We extract the values of taskId, s3Key, s3VersionId, s3BucketArn, and s3BucketName, and then use them to identify the Amazon S3 object. Next, we read its last-modified timestamp and copy its value to an object tag. The other operations are similar to the first Lambda function presented in the article.
Creating an Amazon S3 Batch Operations Job to invoke our Lambda function on existing objects
Now we must create the Amazon S3 Batch Operation job.
- From Amazon S3 left menu select Batch Operation and the Region.
- For the manifest format select S3 inventory report (manifest.json) and input the exact path for the manifest object then select Next.
- On the operation, select Invoke AWS Lambda function.
- Select the Lambda function that you created in the previous step then select Next.
- Select a Bucket/Prefix if you want to save the report of the execution, the Role for running the function and select Create job.
- Once the job is ready, you can select it and select Run job to run the entire process.
Each existing object has now a new tag with the last-modified timestamp that will be integrated to all new backups.
Cleaning up
Now that you have imported the last-modified timestamp on each of your existing objects you want to include in your backup, you can proceed to the deletion of:
- The Batch Operations job
- The manifest files
- The Lambda function described in the last section, used to write the last-modified timestamp on all existing objects
Conclusion
In this blog, we have demonstrated how you can leverage AWS Lambda, S3 Event Notifications, S3 Batch Operations, and S3 object tags together to automatically capture and durably store the last-modified timestamp of your new and existing objects.
You can now use AWS Backup or other AWS Storage Competency Partner backup solutions that protect S3 objects and their object tags to preserve the last-modified timestamp on your S3 data so that you can restore them together from your S3 backups when needed for data compliancy and backup auditing purposes. You can find our AWS Storage Partners here.
Should you need to preserve the last-modified timestamp when using AWS Backup in conjunction with Amazon S3, this post will help you setting up an automated solution that allow you to maintain the last-modified timestamp even after a restore from AWS Backup, making it easier to pass compliancy certifications, or any other data integrity evaluation.
As always, you’re welcome to leave questions or feedback in the comments.