AWS Compute Blog
Choosing between AWS Lambda data storage options in web apps
AWS Lambda is an on-demand compute service that powers many serverless applications. Lambda functions are ephemeral, with execution environments only existing for a brief time when the function is invoked. Many compute operations need access to external data for a variety of purposes. This includes importing third-party libraries, accessing machine learning models, or exporting the output of the compute operation.
Lambda provides a comprehensive range of storage options to meet the needs of web application developers. These include other AWS services such as Amazon S3 and Amazon EFS. There are also native storage options available, such as temporary storage or Lambda layers. In this blog post, I explain the differences between these options, and discuss common use-cases to help you choose for your own applications.
This post references the Happy Path web application series, and you can download the code for that application from the repository.
Amazon S3 – Object storage
Amazon S3 is an object storage service that scales elastically. It offers high availability and 11 9’s of durability. The service is ideal for storing unstructured data. This includes binary data, such as images or media, log files and sensor data.
There are certain characteristics of S3 object storage that are important to remember. While S3 objects can be versioned, you cannot append data as you could in a file system. You have to store an entirely new version of an object. S3 also has a flat storage hierarchy that’s different to a file system. Instead of directories, you use folders to logically organize objects, by prefixing ‘foldername/’ in the key name.
S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3. In the Happy Path application, the image-processing workflows are initiated by this event integration. To learn more about using S3 to trigger automated serverless workflows, visit the learning path.
S3 is often an important repository for an organization’s data lake. If your application writes data to S3 buckets, this can be a useful staging area for downstream processing. For analytics workloads, you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards. To learn how to build business intelligence dashboards for your web application, visit the Innovator Island workshop.
S3 also provides object lifecycle management. This allows you to automatically change storage classes when certain conditions are met. For example, an application for uploading expenses could automatically archive PDFs after 1 year to Amazon S3 Glacier to reduce storage costs. In the Happy Path application, the original high-resolution uploads are stored in a separate bucket from the optimized distribution assets. To reduce storage costs, lifecycle management could be configured to automatically delete these original photo assets after 30 days.
Temporary storage with /tmp
The Lambda execution environment provides a file system for your code to use at /tmp. This space has a fixed size of 512 MB. The same Lambda execution environment may be reused by multiple Lambda invocations to optimize performance. The /tmp area is preserved for the lifetime of the execution environment and provides a transient cache for data between invocations. Each time a new execution environment is created, this area is deleted.
Consequently, this is intended as an ephemeral storage area. While functions may cache data here between invocations, it should be used only for data needed by code in a single invocation. It’s not a place to store data permanently, and is better-used to support operations required by your code.
Operationally, working with files in /tmp is the same as your local hard disk, and offers fast I/O throughput. For example, to unzip a file into this space in Python, use:
import os, zipfile os.chdir('/tmp') with zipfile.ZipFile(myzipfile, 'r') as zip: zip.extractall()
Your Lambda functions may use additional libraries as part of the deployment package. You can bundle these in the deployment archive or optionally move to a layer instead. A Lambda function can have up to five layers, and is subject to the maximum deployment size of 50 MB (zipped, for direct upload). Packages in layers are available in the /opt directory during invocations. While layers are private to you by default, you can also share layers with other AWS accounts, or make layers public.
There are many benefits to using layers throughout the functions in your serverless application. It’s best practice to include the AWS SDK instead of depending on the version bundled with the Lambda service. This enables you to pin the version of the SDK. By using a layer, you don’t need to bundle the package with each function, which can increase your deployment package size and slow down deployments. You can create an AWS SDK layer and then include a reference to the layer in each function.
Layers can be an effective way to bundle large dependencies, or share compiled libraries with binaries that vary by operating system. For example, the Happy Path application uses the Sharp npm graphics library to process images. Similarly, the Innovator Island workshop uses the OpenCV library to perform image manipulation, and this is imported using a shared layer.
Layers are static once they are deployed. You can only change the contents of a layer by deploying a new version. Any Lambda function using the layer binds to a specific version and must be updated to change layer versions. To learn more, see using Lambda layers to simplify your development process.
Amazon EFS for Lambda
Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services. It is durable storage option that offers high availability. You can now mount EFS volumes in Lambda functions, which makes it simpler to share data across invocations. The file system grows and shrinks as you add or delete data, so you do not need to manage storage limits.
The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.
EFS enables new capabilities for serverless applications. The file system is a dynamic binding for Lambda functions, unlike layers. This makes it useful for deploying code libraries where you want to always use the latest version. You configure the mount path when integrating the file system with your function, and then include packages from this location. Additionally, you can use this to include packages that exceed the limits of layers.
Due to its speed and support of standard file operations, EFS is also useful for ingesting or writing large numbers files durably. This can be helpful for zipping or unzipping large archives, for example. For appending to existing files, EFS is also a preferred option to using S3.
To learn more, see using Amazon EFS for AWS Lambda in your serverless applications.
Comparing the different data storage options
This table compares the characteristics of these four different data storage options for Lambda:
|Amazon S3||/tmp||Lambda Layers||Amazon EFS|
|Maximum size||Elastic||512 MB||50 MB (direct upload; larger if from S3).||Elastic|
|Storage type||Object||File system||Archive||File system|
|Lambda event source integration||Native||N/A||N/A||N/A|
|Operations supported||Atomic with versioning||Any file system operation||Immutable||Any file system operation|
|Pricing model||Storage + requests + data transfer||Included in Lambda||Included in Lambda||Storage + data transfer + throughput|
|Sharing/permissions model||IAM||Function-only||IAM||IAM + NFS|
|Source for AWS Glue||Y||N||N||N|
|Source for Amazon QuickSight||Y||N||N||N|
|Relative data access speed from Lambda||Fast||Fastest||Fastest||Very fast|
Lambda is a flexible, on-demand compute service for serverless application. It supports a wide variety of workloads by providing a number of different data storage options.
In this post, I compare the capabilities and use-cases of S3, EFS, Lambda layers, and temporary storage for Lambda functions. There are benefits to each approach, as each type has different behaviors and characteristics. For web application developers, these storage types support different operations depending upon the needs of your serverless backend.
As the newest integration with Lambda, EFS now enables new workloads and capabilities. This includes sharing large code packages with Lambda, or durably operating on large numbers of files. It also opens up new possibilities for developers working on deep learning inference models.
To learn more about storage options available, visit the AWS Serverless homepage. For more serverless learning resources, visit https://serverlessland.com.