When I use Amazon EMR to transform or move data into or out of Amazon S3, several empty files with the "<directoryname>_$folder$" suffix appear in my S3 buckets. What are these files, and is it safe to delete them?

Amazon EMR is a web service that uses a managed Hadoop framework to process, distribute, and interact with data in AWS data stores, including Amazon S3. Because S3 uses a key-value pair storage system, the Hadoop file system implements directory support in S3 by creating empty files with the "<directoryname>_$folder$" suffix.

Note: This behavior occurs only when the Amazon EMR File System (EMRFS) tries to create a folder with an s3:// or s3n:// prefix.

You can safely delete any empty files with the "<directoryname>_$folder$" suffix that appear in your S3 buckets. These empty files are created by the Hadoop framework at runtime, but Hadoop is designed to process data even if these empty files are removed.

Note: If you do not delete the placeholder files with the "<directoryname>_$folder$" suffix, Hadoop generates the error "File exists" when running a job to the original EMRFS destination folder with an s3:// or s3n:// prefix . If you run the same job to a different EMRFS destination folder with an s3:// or s3n:// prefix, you do not receive a "File Exists" error, but the new destination folder will contain empty placeholder files.

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-04-29

Updated: 2018-04-10