AWS Compute Blog
Modernizing Lambda + S3 workloads with Amazon S3 Files
Learn how Amazon S3 Files simplifies Lambda functions by eliminating transfer code and /tmp constraints. See three modernization patterns with code examples for image processing, ETL pipelines, and multi-agent AI workloads.
AWS Lambda functions that interact with Amazon Simple Storage Service (Amazon S3) typically follow a familiar pattern: download an object to /tmp, process it locally, and upload the result back to S3. This pattern is well-understood and reliable, but it requires you to write code for managing transfers, monitoring /tmp capacity, and cleaning up ephemeral storage alongside your actual processing logic.
Amazon S3 Files changes this by letting your Lambda function mount an S3 bucket as a file system. Your function reads and writes files at a local mount path (such as /mnt/data), and the file system handles synchronization with S3 automatically. The transfer and storage management code goes away, and what remains is your processing logic working directly with files.
In this post, we walk through three common Lambda + S3 workloads and show how to modernize each one by using S3 Files. You will see how the code gets shorter, the /tmp size constraint disappears, and the developer experience improves.
Walkthrough
Prerequisites
Before you begin, make sure you have:
- An AWS account with permissions to create Lambda functions, S3 file systems, and VPC resources.
- An existing VPC with private subnets and appropriate security groups.
Getting started
To integrate a Lambda function with S3 Files, you can follow these three steps:
- Create an S3 file system for your bucket. You can do this through the S3 console, AWS Command Line Interface (AWS CLI), or AWS CloudFormation. This single operation creates the file system, mount targets in your Amazon Virtual Private Cloud (Amazon VPC), and an access point.
- Add the file system configuration to your Lambda function. Specify the access point ARN and local mount path (for example, /mnt/data). Your function must be in a VPC with access to the mount target. For optimal throughput on large files, configure your function with 512 MB or more of memory to enable direct reads from S3.
- If you are modernizing your existing Lambda function’s code, replace boto3 transfer code with file paths. Change
s3.download_file(bucket, key, '/tmp/file')toopen('/mnt/data/' + key)and remove upload and cleanup logic.
Your function’s execution role needs s3files:ClientMount and s3files:ClientWrite permissions (included in the AmazonS3FilesClientReadWriteAccess managed policy). For direct S3 reads on large files, also add s3:GetObject and s3:GetObjectVersion.
Pattern 1: Multi-agent shared workspace
Agentic AI workloads, where multiple autonomous agents collaborate on a task, require shared mutable state. Agents need to read each other’s outputs, write intermediate artifacts, and coordinate without tight coupling. With Lambda today, this typically means serializing state to S3 objects or Amazon DynamoDB between every step, adding latency and code for each handoff.
S3 Files gives multiple Lambda functions a shared file system. Agents communicate through the file system itself, with no S3 API calls and no serialization overhead.
Example: Collaborative research agents
Three Lambda functions mount the same S3 bucket at /mnt/workspace. An orchestrator prepares the task, research agents work in parallel, and a synthesis agent combines their findings:
In the traditional approach, each agent would need to call s3.get_object() to read the manifest, s3.put_object() to write findings, and the synthesis agent would need to call s3.list_objects() then s3.get_object() for each result. That’s eight or more S3 API calls per workflow run replaced by file I/O.
What the shared workspace pattern gives you:
- Agents discover each other’s outputs by listing a directory (no coordination logic needed).
- Sessions, agents, and outputs map to directories, not flat object key conventions.
- Close-to-open consistency means that when an agent closes a file after writing, the next agent to open it sees the complete content.
- No need to marshal state into S3 PutObject calls between steps.
Pattern 2: Image thumbnail generation
The S3 thumbnail generator is a common Lambda + S3 pattern. An image is uploaded to S3, a Lambda function is triggered, it downloads the image, resizes it with Pillow, and uploads the thumbnail to a destination bucket.
The traditional approach
What this approach requires you to manage beyond the core resize logic:
- Transfer orchestration: Downloading the source, uploading the result, and handling partial transfer failures.
- Storage capacity: Both the source and resized image must fit in /tmp simultaneously.
- Ephemeral storage cleanup: If the function fails mid-execution or is reused across invocations, orphaned files can accumulate in /tmp.
- Redundant downloads: If the same image triggers a retry, it must be downloaded again.
With the file system approach
What changed
The function moves from a download-process-upload pipeline to direct file I/O. No boto3 client, no /tmp management, no upload step. The resize_image function is unchanged because it always worked with file paths. The difference is that those paths now point to a mounted S3 file system instead of ephemeral local storage.
You still handle errors in your processing logic (for example, invalid image formats). What you no longer need to handle are transfer-specific failure modes like partial downloads, failed uploads, or /tmp capacity checks.
| Metric | Traditional | S3 Files |
| Lines of code (non-blank) | 22 | 15 |
| S3 API calls per invocation | 2 (GET + PUT) | 0 |
| Max image size | Source + output files share /tmp | No /tmp constraint |
| boto3 dependency | Required | Not needed |
Pattern 3: CSV-to-Parquet ETL pipeline
Another commonly used serverless ETL pattern is an S3 event triggers a Lambda function when CSV files land in a bucket. The function downloads the CSV, transforms it to Parquet by using pandas and pyarrow, and uploads the result.
The traditional approach
What this approach requires you to manage beyond the core transform logic:
- Storage capacity: Source and output files share /tmp, limiting practical file size.
- Cold start cost: Initializing the boto3 client adds startup latency.
- Transfer failure modes: Partial downloads, failed uploads, and orphaned /tmp files need their own handling.
- Redundant downloads: Retries or reprocessing require downloading the same file again.
With the file system approach
What changed
With this change, a developer reading this code sees only the transform logic (read CSV, add column, write Parquet). The storage mechanics are handled by the file system.
| Metric | Traditional | S3 Files |
| Lines of code (non-blank) | 33 | 14 |
| S3 API calls per invocation | 2 (GET + PUT) | 0 |
| Max file size | Source + output share /tmp | No /tmp constraint |
| Cleanup logic required | Yes | No |
| /tmp space monitoring | Yes | No |
Choosing the right approach: file system mounts vs. traditional access
| Use case | Recommendation |
| Lambda reads/writes files from S3 | S3 Files (eliminates transfer boilerplate) |
| Multiple functions share data | S3 Files (shared mount replaces API coordination) |
| Files > 10 GB | S3 Files (no /tmp size constraint) |
| Event-driven processing (trigger on upload) | S3 Files (S3 event triggers still work, function reads from mount) |
| Direct S3 API features (presigned URLs, S3 Select, multipart upload) | Traditional (these require the S3 API) |
| Functions outside a VPC | Traditional (S3 Files requires VPC connectivity) |
Cleaning up
If you created resources while following along with this post, delete them to avoid incurring future costs. Start by removing the file system configuration from your Lambda function settings. Next, remove the S3 file system, which also deletes its associated mount targets and access points. Then delete the S3 buckets used for source and output data, along with the Lambda functions created for the examples. Finally, remove the IAM roles and policies created for Lambda execution or, if you added the S3 Files permissions (s3files:ClientMount, s3files:ClientWrite, s3:GetObject, s3:GetObjectVersion) to an existing role, remove these permissions. Additionally, If you created a new VPC for this tutorial, delete the VPC, which will also remove the associated private subnets, security groups, and route tables. If you used an existing VPC, remove the security groups and subnets created for this testing.
Warning: Deletion of an S3 bucket and its contents permanently deletes all objects in the buckets and cannot be undone. Make sure you have backed up any data you need to retain before proceeding.
Conclusion
In this post, we demonstrated how to modernize three common Lambda + S3 workloads by using Amazon S3 Files. Across image thumbnail generation, ETL pipelines, and multi-agent AI workloads, the migration follows the same principle: replace S3 API transfer logic with native file I/O and let the file system handle synchronization.
The improvements are consistent:
- Less code: Transfer and cleanup logic goes away, leaving only your processing logic.
- No /tmp size constraint: Process large files without local storage limits.
- Zero S3 API calls for data access: Reads and writes go through the file system mount.
- Fewer failure modes to handle: Transfer-specific issues (partial downloads, failed uploads, orphaned temp files) no longer apply.
For teams running Lambda + S3 workloads today, S3 Files isn’t a new architecture to learn. It’s transfer code you can remove. To learn more, see the S3 Files section in the Lambda documentation. To track upcoming features on the AWS Lambda roadmap, you can refer to the AWS Lambda roadmap.