Can I use Amazon S3 for Hadoop storage instead of HDFS?
Last updated: 2020-06-04
Can I configure Amazon EMR to use Amazon Simple Storage Service (Amazon S3) as the Apache Hadoop storage system instead of the Hadoop Distributed File System (HDFS)?
You can't configure Amazon EMR to use Amazon S3 instead of HDFS for the Hadoop storage layer. HDFS and the EMR File System (EMRFS), which uses Amazon S3, are both compatible with Amazon EMR, but they're not interchangeable. HDFS is an implementation of the Hadoop FileSystem API, which models POSIX file system behavior. EMRFS is an object store, not a file system. For more information, see Object Stores vs. Filesystems in the Hadoop documentation.
For recommendations about when to use each file system, see Work with Storage and File Systems.