When I use Amazon EMR with Amazon S3, empty files with the _$folder$ suffix appear in my S3 bucket. Can I safely delete these files?
Last updated: 2019-10-22
When I use Amazon EMR to transform or move data into or out of Amazon Simple Storage Service (Amazon S3), several empty files with the "_$folder$" suffix appear in my S3 buckets. What are these files, and is it safe to delete them?
The "_$folder$" files are placeholders. Apache Hadoop creates these files when you use the -mkdir command to create a folder in an S3 bucket. Hadoop doesn't create the folder until you PUT the first object. If you delete the "_$folder$" files before you PUT at least one object, Hadoop can't create the folder. This results in a "No such file or directory" error.
In general, it's a best practice not to delete the "_$folder$" files. Doing so could cause performance issues for the Amazon EMR job. The exception is if you manually delete the folder from Amazon S3 and then try to recreate the folder in an Amazon EMR job or with Hadoop commands. If you don't delete the "_$folder$" files before you try to recreate the folder, you get a "File exists" error.