I'm trying to create an Amazon EMR cluster with a configuration similar to the following, but the cluster fails in the bootstrap action stage:

[
  {
    "Classification": "core-site",
    "Properties": {
      "fs.defaultFS": "s3://myS3Bucket/prefix/"
    }
  }
]

Can I configure Amazon EMR to use Amazon Simple Storage Service (Amazon S3) as the Apache Hadoop storage system instead of using the Hadoop Distributed File System (HDFS)? 

You can't configure Amazon EMR to use S3 instead of HDFS for the Hadoop storage layer. HDFS and the EMR File System (EMRFS), which uses Amazon S3, are both compatible with Amazon EMR, but they are not interchangeable. HDFS is an implementation of the Hadoop FileSystem API, which models POSIX file system behavior. EMRFS is an object store, not a file system. For more information, see Object Stores vs. Filesystems.

For recommendations about when to use each file system, see Work with Storage and File Systems.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-07-05