Can I safely delete the empty files with the _$folder$ suffix that appear in my Amazon S3 bucket when I use Amazon EMR with Amazon S3?
Last updated: 2021-04-15
When I use Amazon EMR to transform or move data into or out of Amazon Simple Storage Service (Amazon S3), several empty files with the "_$folder$" suffix appear in my S3 buckets. What are these files, and is it safe to delete them?
The "_$folder$" files are placeholders. Apache Hadoop creates these files when you use the -mkdir command to create a folder in an S3 bucket. Hadoop doesn't create the folder until you PUT the first object. If you delete the "_$folder$" files before you PUT at least one object, Hadoop can't create the folder. This results in a "No such file or directory" error.
In general, it's a best practice not to delete the "_$folder$" files. Doing so might cause performance issues for the Amazon EMR job. The exception is if you manually delete the folder from Amazon S3 and then try to recreate the folder in an Amazon EMR job or with Hadoop commands. If you don't delete the "_$folder$" files before you try to recreate the folder, you get the "File exists" error.