How do I troubleshoot retry and timeout issues when invoking a Lambda function using an AWS SDK?

Last updated: 2021-03-18

When I invoke my AWS Lambda function using an AWS SDK, the function times out, the API request stops responding, or an API action is duplicated. How do I troubleshoot these issues?

Short description

There are three reasons why retry and timeout issues occur when invoking a Lambda function with an AWS SDK:

  • A remote API is unreachable or takes too long to respond to an API call.
  • The API call doesn't get a response within the socket timeout.
  • The API call doesn't get a response within the Lambda function's timeout period.

Note: API calls can take longer than expected when network connection issues occur. Network issues can also cause retries and duplicated API requests. To prepare for these occurrences, make sure that your Lambda function is idempotent.

If you make an API call using an AWS SDK and the call fails, the AWS SDK automatically retries the call. How many times the AWS SDK retries and for how long is determined by settings that vary among each AWS SDK.

Default AWS SDK retry settings

Note: Some values may be different for other AWS services.

AWS SDK Maximum retry count Connection timeout Socket timeout
Python (Boto 3) depends on service 60 seconds 60 seconds
JavaScript/Node.js depends on service N/A 120 seconds
Java 3 10 seconds 50 seconds
.NET 4 100 seconds 300 seconds
Go 3 N/A N/A

To troubleshoot the retry and timeout issues, first review the logs of the API call to find the problem. Then, change the retry count and timeout settings of the AWS SDK as needed for each use case. To allow enough time for a response to the API call, add time to the Lambda function timeout setting.

Resolution

Log the API calls made by the AWS SDK

Using Amazon CloudWatch Logs, you can get details about failed connections and a count of attempted retries. For more information, see Accessing Amazon CloudWatch logs for AWS Lambda. Or, see the instructions for the AWS SDK that you're using:

Example error log where the API call failed to establish a connection (socket timeout)

START RequestId: b81e56a9-90e0-11e8-bfa8-b9f44c99e76d Version: $LATEST
2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    [AWS ec2 undefined 40.29s 3 retries] describeInstances({})
2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    { TimeoutError: Socket timed out without establishing a connection

...

Example error log where the connection timed out after the API response took too long (connection timeout)

START RequestId: 3c0523f4-9650-11e8-bd98-0df3c5cf9bd8 Version: $LATEST
2018-08-02T12:33:18.958Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    [AWS ec2 undefined 30.596s 3 retries] describeInstances({})
2018-08-02T12:33:18.978Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    { TimeoutError: Connection timed out after 30s

Note: These logs aren't generated if the API request doesn't get a response within your Lambda function's timeout. If the API request ends because of a function timeout, try one of the following:

Change the AWS SDK's settings

The retry count and timeout settings of the AWS SDK should allow enough time for your API call to get a response. To determine the right values for each setting, test different configurations and get the following information:

  • Average time to establish a successful connection
  • Average time that a full API request takes (until it's successfully returned)
  • If retries should be made by the AWS SDK or code

For more information on changing retry count and timeout settings, see the following AWS SDK client configuration documentation:

The following are some example commands that change retry count and timeout settings for each runtime.

Important: Before using any of the following commands, replace the example values for each setting with the values for your use case.

Example Python (Boto 3) command to change retry count and timeout settings

# max_attempts: retry count / read_timeout: socket timeout / connect_timeout: new connection timeout

from botocore.session import Session
from botocore.config import Config

s = Session()
c = s.create_client('s3', config=Config(connect_timeout=5, read_timeout=60, retries={'max_attempts': 2}))

Example JavaScript/Node.js command to change retry count and timeout settings

// maxRetries: retry count / timeout: socket timeout / connectTimeout: new connection timeout

var AWS = require('aws-sdk');

AWS.config.update({

    maxRetries: 2,

    httpOptions: {

        timeout: 30000,

        connectTimeout: 5000

    }

});

Example Java command to change retry count and timeout settings

// setMaxErrorRetry(): retry count / setSocketTimeout(): socket timeout / setConnectionTimeout(): new connection timeout

ClientConfiguration clientConfig = new ClientConfiguration(); 

clientConfig.setSocketTimeout(60000); 
clientConfig.setConnectionTimeout(5000);
clientConfig.setMaxErrorRetry(2);

AmazonDynamoDBClient ddb = new AmazonDynamoDBClient(credentialsProvider,clientConfig);

Example .NET command to change retry count and timeout settings

// MaxErrorRetry: retry count / ReadWriteTimeout: socket timeout / Timeout: new connection timeout

var client = new AmazonS3Client(

    new AmazonS3Config {
        Timeout = TimeSpan.FromSeconds(5),
        ReadWriteTimeout = TimeSpan.FromSeconds(60),
        MaxErrorRetry = 2
});

Example Go command to change retry count settings

// Create Session with MaxRetry configuration to be shared by multiple service clients.
sess := session.Must(session.NewSession(&aws.Config{
    MaxRetries: aws.Int(3),
}))
 
// Create S3 service client with a specific Region.
svc := s3.New(sess, &aws.Config{
    Region: aws.String("us-west-2"),
})

Example Go command to change request timeout settings

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// SQS ReceiveMessage
params := &sqs.ReceiveMessageInput{ ... }
req, resp := s.ReceiveMessageRequest(params)
req.HTTPRequest = req.HTTPRequest.WithContext(ctx)
err := req.Send()

(Optional) Change your Lambda function's timeout setting

A low Lambda function timeout can cause healthy connections to be dropped early. If that's happening in your use case, increase the function timeout setting to allow enough time for your API call to get a response.

Use the following formula to estimate the base time needed for the function timeout:

First attempt (connection timeout + socket timeout) + Number of retries x (connection timeout + socket timeout)

For example, suppose that an AWS SDK is configured for three retries, a connection timeout of 10 seconds, and a socket timeout of 30 seconds. In that case, the Lambda function timeout should be at least 160 seconds:

First attempt (10 seconds + 30 seconds) + Number of retries [3 * (10 seconds + 30 seconds)] = 160 seconds

Add an additional margin of time (for example, 20 seconds) to handle the rest of the code runtime:

160 + 20 = 180 seconds

Did this article help?


Do you need billing or technical support?