AWS Compute Blog
Using larger ephemeral storage for AWS Lambda
AWS Lambda functions have always had ephemeral storage available at /tmp in the file system. This was set at 512 MB for every function, regardless of runtime or memory configuration. With this new feature, you can now configure ephemeral storage for up to 10 GB per function instance.
You can set this in the AWS Management Console, AWS CLI, or AWS SDK, AWS Serverless Application Model (AWS SAM), AWS Cloud Development Kit (AWS CDK), AWS Lambda API, and AWS CloudFormation. This blog post explains how this works and how to use this new setting in your Lambda functions.
How ephemeral storage works in Lambda
All functions have ephemeral storage available at the fixed file system location /tmp. This provides a fast file system-based scratch area that is scoped to a specific instance of a Lambda function. This storage is not shared between instances of Lambda functions and the space is guaranteed to be empty when a new instance starts.
This means that you can use the same execution environment to cache static assets in /tmp between invocations. This is a common use case that can help reduce function duration for subsequent invocations. The contents are deleted when the Lambda service eventually terminates the execution environment.
With this new configurable setting, ephemeral storage works in the same way. The behavior is identical whether you use zip or container images to deploy your functions. It’s also available for Provisioned Concurrency. All data stored in /tmp is encrypted at rest with a key managed by AWS.
Common use cases for ephemeral storage
There are several common customer use cases that can benefit from the expanded ephemeral storage.
Extract-transform-load (ETL) jobs: Your code may perform intermediate computation or download other resources to complete processing. More temporary space enables more complex ETL jobs to run in Lambda functions.
Machine learning (ML) inference: Many inference tasks rely on large reference data files, including libraries and models. More ephemeral storage allows you to download larger models from Amazon S3 to /tmp and use these in your processing. To learn more about using Lambda for ML inference, read Building deep learning inference with AWS Lambda and Amazon EFS and Pay as you go machine learning inference with AWS Lambda.
Data processing: For workloads that download objects from S3 in response to S3 events, the larger /tmp space makes it possible to handle larger objects without using in-memory processing. Workloads that create PDFs, use headless Chromium, or process media also benefit from more ephemeral storage.
Zip processing: Some workloads use large zip files from data providers to initialize local databases. These can now unzip to the local file system without the need for in-memory processing. Similarly, applications that generate zip files also benefit from more /tmp space.
Graphics processing: Image processing is a common use-case for Lambda-based applications. For workloads processing large tiff files or satellite images, this makes it easier to use libraries like ImageMagick to perform all the computation in Lambda. Customers using geospatial libraries also gain significant flexibility from writing large satellite images to /tmp.
Deploying the example application
The example application shows how to resize an MP4 file from Amazon S3, using the temporary space for intermediate processing. In this example, you can process video files much larger than the standard 512 MB temporary storage:
Before deploying the example, you need:
- An AWS account (sign up for an account if you don’t have one).
- The AWS SAM CLI installed.
- Node.js installed (version 14 minimum).
This example uses the AWS Serverless Application Model (AWS SAM). To deploy:
- From a terminal window, clone the GitHub repo:
git clone https://github.com/aws-samples/s3-to-lambda-patterns
- Change directory to this example:
cd ./resize-video
- Follow the installation instructions in the README file.
To test the application, upload an MP4 file into the source S3 bucket. After processing, the destination bucket contains the resized video file.
How the example works
The resize function downloads the original video from S3 and saves the result in Lambda’s temporary storage directory:
// Get signed URL for source object
const Key = decodeURIComponent(record.s3.object.key.replace(/\+/g, ' '))
const data = await s3.getObject({
Bucket: record.s3.bucket.name,
Key
}).promise()
// Save original to tmp directory
const tempFile = `${ffTmp}/${Key}`
console.log('Saving downloaded file to ', tempFile)
fs.writeFileSync(tempFile, data.Body)
The application uses FFmpeg to resize the video and store the output in the temporary storage space:
// Save resized video to /tmp
const outputFilename = `${Key.split('.')[0]}-smaller.mp4`
console.log(`Resizing and saving to ${outputFilename}`)
await execPromise(`${ffmpegPath} -i "${tempFile}" -loglevel error -vf scale=160:-1 -sws_flags fast_bilinear ${ffTmp}/${outputFilename}`)
After processing, the function reads the file from the temporary directory and then uploads to the destination bucket in S3:
const tmpData = fs.readFileSync(`${ffTmp}/${outputFilename}`)
console.log(`tmpData size: ${tmpData.length}`)
// Upload to S3
console.log(`Uploading ${outputFilename} to ${outputFilename}`)
await s3.putObject({
Bucket: process.env.OutputBucketName,
Key: outputFilename,
Body: tmpData
}).promise()
console.log(`Object written to ${process.env.OutputBucketName}`)
Since temporary storage is not deleted between warm Lambda invocations, you may also choose to remove unneeded files. This example uses a tmpCleanup function to delete the contents of /tmp:
const fs = require('fs')
const path = require('path')
const directory = '/tmp/'
// Deletes all files in a directory
const tmpCleanup = async () => {
console.log('Starting tmpCleanup')
fs.readdir(directory, (err, files) => {
return new Promise((resolve, reject) => {
if (err) reject(err)
console.log('Deleting: ', files)
for (const file of files) {
const fullPath = path.join(directory, file)
fs.unlink(fullPath, err => {
if (err) reject (err)
})
}
resolve()
})
})
}
Setting ephemeral storage with the AWS Management Console or AWS CLI
In the Lambda console, you can view the ephemeral storage allocated to a function in the Generation configuration menu in the Configuration tab:
To make changes to this setting, choose Edit. In the Edit basic settings page, adjust the Ephemeral Storage to any value between 512 MB and 10240 MB. Choose Save to update the function’s settings.
You can also define the ephemeral storage setting in the create-function and update-function-configuration CLI commands. In both cases, use the ephemeral-storage switch to set the value:
aws lambda create-function --function-name testFunction --runtime python3.9 --handler lambda_function.lambda_handler --code S3Bucket=myBucket,S3Key=function.zip --role arn:aws:iam::123456789012:role/testFunctionRole --ephemeral-storage '{"Size": 10240}'
To modify this setting for testFunction, run:
aws lambda update-function-configuration --function-name testFunction --ephemeral-storage '{"Size": 5000}'
Setting ephemeral storage with AWS CloudFormation or AWS SAM
You can define the size of ephemeral storage in both AWS CloudFormation and AWS SAM templates by using the EphemeralStorage attribute. As shown in the example’s template.yaml, there is a new attribute called EphemeralStorage:
ResizeFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: resizeFunction/
Handler: app.handler
Runtime: nodejs14.x
Timeout: 900
MemorySize: 10240
EphemeralStorage:
Size: 10240
You define this on a per-function basis. If the attribute is missing, the function is allocated 512 MB of temporary storage.
Using Lambda Insights to monitor temporary storage usage
You can use Lambda Insights to query on the metrics emitted by the Lambda function relating to the usage of temporary storage. First, enable Lambda Insights on a function by following these steps in the documentation.
After running the function, the Lambda service writes ephemeral storage metrics to Amazon CloudWatch Logs. Note that with this feature enabled, the logs are written to a log group following the pattern /aws/lambda-insights
and you should use this pattern for your queries.
With Lambda Insights enabled, you can now query these from the CloudWatch console. From the Logs Insights feature, you can query to determine the maximum, used, and available space available:
Calculating the cost of more temporary storage
Ephemeral storage is free up to 512 MB, as it always has been. You are charged for the amount you select between 512 MB and 10,240 MB. For example, if you select 1,024 MB, you only pay for 512 MB. Expanded ephemeral storage costs $0.0000000308 per GB/second in the us-east-1 Region (see the pricing page for other Regions).
In us-east-1, for a workload invoking a Lambda function 500,000 times with a 10 second duration, using the maximum temporary storage, the cost is $0.63:
Invocations | 500,000 |
Duration (ms) | 10,000 |
Ephemeral storage (over 512 MB) | 9,728 |
Storage price per GB/s | $0.0000000308 |
GB/s total | 20,480,000 |
Price of storage | $0.63 |
Choosing between ephemeral storage and Amazon EFS
Generally, ephemeral storage is designed for intermediary processing of a function. You can download reference data, machine learning models, or database metadata from other sources such as Amazon S3, and store these in /tmp for further processing. Ephemeral storage can provide a cache for data for repeat usage across invocations and offers fast I/O throughout.
Alternatively, EFS is primarily intended for customers that need to:
- Share data or state across function invocations.
- Process files larger than the 10,240 MB storage allows.
- Use file-system type functionality, such as appending to or modifying files.
Conclusion
Serverless developers can now configure the amount of temporary storage available in AWS Lambda functions. This blog post discusses common use cases and walks through an example application that uses larger temporary storage. It also shows how to configure this in CloudFormation and AWS SAM and explains the cost if you use more than the free, provisioned 512 MB that’s automatically provisioned for every function.
For more serverless learning resources, visit Serverless Land.