How do I troubleshoot retry and timeout issues when invoking a Lambda function using an AWS SDK?

Last updated: 2019-05-16

When I try to invoke my AWS Lambda function using an AWS SDK, the function times out, execution hangs, or an API action is duplicated. How do I fix these issues?

Short Description

These issues can occur when:

  • You call a remote API that takes too long to respond or that is unreachable.
  • Your API call doesn't get a response within the socket timeout.
  • Your API call doesn't get a response within the timeout period of your Lambda function.

Note: API calls can take longer than expected when network connection issues occur. Network issues can also cause retries and duplicated API requests. To prepare for these occurrences, your Lambda function must always be idempotent.

If you make an API call using an AWS SDK and the call fails, the SDK automatically retries the call. How long and how many times the SDK retries is determined by settings that vary among each SDK. Here are the default values of these settings:

Note: Some values may be different for certain AWS services.

AWS SDK Maximum retry count Connection timeout Socket timeout
Python (Boto 3) depends on service 60 seconds 60 seconds
JavaScript/Node.js depends on service N/A 120 seconds
Java 3 10 seconds 50 seconds
.NET 4 100 seconds 300 seconds
Go 3 N/A N/A

To fix the retry and timeout issues, review the logs of the API call to find the problem. Then, change the retry count and timeout settings of the SDK as needed for each use case. To allow enough time for a response to the API call, add time to the Lambda function timeout setting.

Resolution

Log the API calls made by the SDK

Using Amazon CloudWatch Logs, you can get details about failed connections and a count of attempted retries. For more information, see Accessing Amazon CloudWatch Logs for AWS Lambda, or see the instructions for the SDK that you're using:

In this example error log, the API call failed to establish a connection (socket timeout):

START RequestId: b81e56a9-90e0-11e8-bfa8-b9f44c99e76d Version: $LATEST
2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    [AWS ec2 undefined 40.29s 3 retries] describeInstances({})
2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    { TimeoutError: Socket timed out without establishing a connection

...

In this example error log, the connection was successful, but it timed out after the response took too long (connection timeout):

START RequestId: 3c0523f4-9650-11e8-bd98-0df3c5cf9bd8 Version: $LATEST
2018-08-02T12:33:18.958Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    [AWS ec2 undefined 30.596s 3 retries] describeInstances({})
2018-08-02T12:33:18.978Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    { TimeoutError: Connection timed out after 30s

Note: These logs aren't generated if the API call doesn't get a response within your Lambda function's timeout. If the execution ends because of a function timeout, try one of the following:

Change the settings of the SDK

The retry count and timeout settings of the SDK should allow enough time for your API call to get a response. To determine the right values for each setting, test different configurations and get the following information:

  • Average time to establish a successful connection
  • Average time that a full API request takes (until it's successfully returned)
  • Whether retries should be made by the SDK or code

For more information on changing these settings, see the SDK client configuration documentation:

Here are examples of how to change these settings for each runtime:

Note: Before using any of the following example commands, replace the example values for each setting with the values for your use case.

Python (Boto 3) example:

# max_attempts: retry count / read_timeout: socket timeout / connect_timeout: new connection timeout

from botocore.session import Session
from botocore.config import Config

s = Session()
c = s.create_client('s3', config=Config(connect_timeout=5, read_timeout=60, retries={'max_attempts': 2}))

JavaScript/Node.js example:

// maxRetries: retry count / timeout: socket timeout / connectTimeout: new connection timeout

var AWS = require('aws-sdk');

AWS.config.update({

    maxRetries: 2,

    httpOptions: {

        timeout: 30000,

        connectTimeout: 5000

    }

});

Java example:

// setMaxErrorRetry(): retry count / setSocketTimeout(): socket timeout / setConnectionTimeout(): new connection timeout

ClientConfiguration clientConfig = new ClientConfiguration(); 

clientConfig.setSocketTimeout(60000); 
clientConfig.setConnectionTimeout(5000);
clientConfig.setMaxErrorRetry(2);

AmazonDynamoDBClient ddb = new AmazonDynamoDBClient(credentialsProvider,clientConfig);

.NET example:

// MaxErrorRetry: retry count / ReadWriteTimeout: socket timeout / Timeout: new connection timeout

var client = new AmazonS3Client(

    new AmazonS3Config {
        Timeout = TimeSpan.FromSeconds(5),
        ReadWriteTimeout = TimeSpan.FromSeconds(60),
        MaxErrorRetry = 2
});

Go example for retries:

// Create Session with MaxRetry configuration to be shared by multiple service clients.
sess := session.Must(session.NewSession(&aws.Config{
    MaxRetries: aws.Int(3),
}))
 
// Create S3 service client with a specific Region.
svc := s3.New(sess, &aws.Config{
    Region: aws.String("us-west-2"),
}) 

Go example for request timeouts:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// SQS ReceiveMessage
params := &sqs.ReceiveMessageInput{ ... }
req, resp := s.ReceiveMessageRequest(params)
req.HTTPRequest = req.HTTPRequest.WithContext(ctx)
err := req.Send()

(Optional) Change your Lambda function's timeout setting

A low Lambda function timeout can cause healthy connections to be dropped prematurely. If that's happening in your use case, increase the function timeout setting to allow enough time for your API call to get a response. Use this formula to estimate the base time needed for the function timeout:

Retries * (Connection timeout + Socket timeout)

For example, say that the SDK is configured for 3 retries, a connection timeout of 10 seconds, and a socket timeout of 30 seconds. In that case, your Lambda function timeout should be at least 120 seconds:

3 * (10 + 30) = 120 seconds

Add an additional margin of time (for example, 20 seconds) to handle the rest of the code execution:

120 + 20 = 140 seconds

Did this article help you?

Anything we could improve?


Need more help?