How do I troubleshoot retry and timeout issues when invoking a Lambda function using an AWS SDK?

6 minute read
0

When I invoke my AWS Lambda function using an AWS SDK, the function times out, the API request stops responding, or an API action is duplicated. How do I troubleshoot these issues?

Short description

There are three reasons why retry and timeout issues occur when invoking a Lambda function with an AWS SDK:

  • A remote API is unreachable or takes too long to respond to an API call.
  • The API call doesn't get a response within the socket timeout.
  • The API call doesn't get a response within the Lambda function's timeout period.

Note: API calls can take longer than expected when network connection issues occur. Network issues can also cause retries and duplicated API requests. To prepare for these occurrences, make sure that your Lambda function is idempotent.

If you make an API call using an AWS SDK and the call fails, the AWS SDK automatically retries the call. How many times the AWS SDK retries and for how long is determined by settings that vary among each AWS SDK.

Default AWS SDK retry settings

Note: Some values may be different for other AWS services.

AWS SDKMaximum retry countConnection timeoutSocket timeout
Python (Boto 3)depends on service60 seconds60 seconds
JavaScript/Node.jsdepends on serviceN/A120 seconds
Java310 seconds50 seconds
.NET4100 seconds300 seconds
Go3N/AN/A

To troubleshoot the retry and timeout issues, first review the logs of the API call to find the problem. Then, change the retry count and timeout settings of the AWS SDK as needed for each use case. To allow enough time for a response to the API call, add time to the Lambda function timeout setting.

Resolution

Log the API calls made by the AWS SDK

Use Amazon CloudWatch Logs to get details about failed connections and the number of attempted retries for each. For more information, see Accessing Amazon CloudWatch logs for AWS Lambda. Or, see the following instructions for the AWS SDK that you're using:

Example error log where the API call failed to establish a connection (connection timeout)

START RequestId: b81e56a9-90e0-11e8-bfa8-b9f44c99e76d Version: $LATEST
2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    [AWS ec2 undefined 40.29s 3 retries] describeInstances({})
2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    { TimeoutError: Socket timed out without establishing a connection

...

Example error log where the API call connection was successful, but timed out after the API response took too long (socket timeout)

START RequestId: 3c0523f4-9650-11e8-bd98-0df3c5cf9bd8 Version: $LATEST
2018-08-02T12:33:18.958Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    [AWS ec2 undefined 30.596s 3 retries] describeInstances({})
2018-08-02T12:33:18.978Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    { TimeoutError: Connection timed out after 30s

Note: These logs aren't generated if the API request doesn't get a response within your Lambda function's timeout. If the API request ends because of a function timeout, try one of the following:

Change the AWS SDK's settings

The retry count and timeout settings of the AWS SDK should allow enough time for your API call to get a response. To determine the right values for each setting, test different configurations and get the following information:

  • Average time to establish a successful connection
  • Average time that a full API request takes (until it's successfully returned)
  • If retries should be made by the AWS SDK or code

For more information on changing retry count and timeout settings, see the following AWS SDK client configuration documentation:

The following are some example commands that change retry count and timeout settings for each runtime.

Important: Before using any of the following commands, replace the example values for each setting with the values for your use case.

Example Python (Boto 3) command to change retry count and timeout settings

# max_attempts: retry count / read_timeout: socket timeout / connect_timeout: new connection timeout

from botocore.session import Session
from botocore.config import Config

s = Session()
c = s.create_client('s3', config=Config(connect_timeout=5, read_timeout=60, retries={'max_attempts': 2}))

Example JavaScript/Node.js command to change retry count and timeout settings

// maxRetries: retry count / timeout: socket timeout / connectTimeout: new connection timeout

var AWS = require('aws-sdk');

AWS.config.update({

    maxRetries: 2,

    httpOptions: {

        timeout: 30000,

        connectTimeout: 5000

    }

});

Example Java command to change retry count and timeout settings

// setMaxErrorRetry(): retry count / setSocketTimeout(): socket timeout / setConnectionTimeout(): new connection timeout

ClientConfiguration clientConfig = new ClientConfiguration(); 

clientConfig.setSocketTimeout(60000); 
clientConfig.setConnectionTimeout(5000);
clientConfig.setMaxErrorRetry(2);

AmazonDynamoDBClient ddb = new AmazonDynamoDBClient(credentialsProvider,clientConfig);

Example .NET command to change retry count and timeout settings

// MaxErrorRetry: retry count / ReadWriteTimeout: socket timeout / Timeout: new connection timeout

var client = new AmazonS3Client(

    new AmazonS3Config {
        Timeout = TimeSpan.FromSeconds(5),
        ReadWriteTimeout = TimeSpan.FromSeconds(60),
        MaxErrorRetry = 2
});

Example Go command to change retry count settings

// Create Session with MaxRetry configuration to be shared by multiple service clients.
sess := session.Must(session.NewSession(&aws.Config{
    MaxRetries: aws.Int(3),
}))
 
// Create S3 service client with a specific Region.
svc := s3.New(sess, &aws.Config{
    Region: aws.String("us-west-2"),
})

Example Go command to change request timeout settings

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// SQS ReceiveMessage
params := &sqs.ReceiveMessageInput{ ... }
req, resp := s.ReceiveMessageRequest(params)
req.HTTPRequest = req.HTTPRequest.WithContext(ctx)
err := req.Send()

(Optional) Change your Lambda function's timeout setting

A low Lambda function timeout can cause healthy connections to be dropped early. If that's happening in your use case, increase the function timeout setting to allow enough time for your API call to get a response.

Use the following formula to estimate the base time needed for the function timeout:

First attempt (connection timeout + socket timeout) + Number of retries x (connection timeout + socket timeout) + 20 seconds additional code runtime margin = Required Lambda function timeout

Example Lambda function timeout calculation

Note: The following calculation is for an AWS SDK that's configured for three retries, a 10-second connection timeout, and a 30-second socket timeout.

First attempt (10 seconds + 30 seconds) + Number of retries [3 * (10 seconds + 30 seconds)] + 20 seconds additional code runtime margin = 180 seconds

Related information

Invoke (Lambda API reference)

Error handling and automatic retries in AWS Lambda

Lambda quotas

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago