My Apache Hadoop job in Amazon EMR fails with the error message "Timeout waiting for connection from pool."

This error usually happens when you reach the Amazon EMR File System (EMRFS) connection limit for Amazon Simple Storage Service (Amazon S3). To resolve this error, increase the value of the fs.s3.maxConnections property. You can do this while your cluster is running or when you create a new cluster.

Increase the fs.s3.maxConnections value on a running cluster

1.    Connect to the Master Node Using SSH.

2.    Open the emrfs-site.xml file as sudo. This file is located in the /usr/share/aws/emr/emrfs/conf directory.

sudo vi /usr/share/aws/emr/emrfs/conf/emrfs-site.xml

3.    Set the fs.s3.maxConnections property to a value above 50. In the following example, the value is set to 100. You might need to choose a higher value, depending on how many concurrent S3 connections your applications need.
Note: If you launch your cluster with Apache HBase, the fs.s3.maxConnections value is set to 1000 by default. If increasing the fs.s3.maxConnections value doesn't resolve the timeout error, check your applications for connection leaks.


4.    Repeat steps 2 and 3 on all core and task nodes. Use the same fs.s3.maxConnections value that you used on the master node.

5.    Run the Hadoop job again. Your application should use the new value for fs.s3.maxConnections without a service restart.

Increase the fs.s3.maxConnections value on a new cluster

To set the value of the fs.s3.maxConnections property on all nodes when you launch a new cluster, use a configuration object similar to the following. For more information, see Configuring Applications.

      "Classification": "emrfs-site",
      "Properties": {
        "fs.s3.maxConnections": "100",

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2019-01-28