How do I resolve the error "Timeout waiting for connection from pool" in Amazon EMR?

Last updated: 2021-07-13

My Apache Hadoop job in Amazon EMR fails with the error message "Timeout waiting for connection from pool".

Resolution

This error usually happens when you reach the Amazon EMR File System (EMRFS) connection limit for Amazon Simple Storage Service (Amazon S3). To resolve this error, increase the value of the fs.s3.maxConnections property. You can do this while your cluster is running or when you create a new cluster.

Increase the fs.s3.maxConnections value on a running cluster

1.    Connect to the master node using SSH.

2.    Run the following command to open the emrfs-site.xml file as sudo. This file is located in the /usr/share/aws/emr/emrfs/conf directory.

sudo vi /usr/share/aws/emr/emrfs/conf/emrfs-site.xml

3.    Set the fs.s3.maxConnections property to a value above 50. In the following example, the value is set to 100. You might need to choose a higher value, depending on how many concurrent S3 connections that your applications need.
Note: If you launch your cluster with Apache HBase, then the fs.s3.maxConnections value is set to 1000 by default. If increasing the fs.s3.maxConnections value doesn't resolve the timeout error, then check your applications for connection leaks.

<property>
  <name>fs.s3.maxConnections</name>
  <value>100</value>
</property>

4.    Repeat steps 2 and 3 on all core and task nodes. Use the same fs.s3.maxConnections value that you used on the master node.
Note: With Amazon EMR version 5.21.0 and later, you can reconfigure cluster applications and specify additional configuration classifications for each instance group in a running cluster. For more information, see Reconfigure an instance group in a running cluster.

5.    Run the Hadoop job again. Your application must use the new value for fs.s3.maxConnections without a service restart.

Increase the fs.s3.maxConnections value on a new cluster

To set the value of the fs.s3.maxConnections property on all nodes when you launch a new cluster, use a configuration object similar to the following. For more information, see Configuring applications.

[
    {
      "Classification": "emrfs-site",
      "Properties": {
        "fs.s3.maxConnections": "100",
      }
    }
 ]

Did this article help?


Do you need billing or technical support?