Why am I experiencing a data delivery failure with Kinesis Data Firehose?

Last updated: 2020-06-16

I'm trying to send data from Amazon Kinesis Data Firehose to my Amazon Elasticsearch Service (Amazon ES) domain. Why am I experiencing a data delivery failure?

Short Description

A failed delivery between Kinesis Data Firehose and Amazon ES can be caused by the following reasons:

  • Invalid delivery destination
  • No incoming data
  • Disabled Kinesis Data Firehose logs
  • Lack of proper permissions
  • AWS Lambda function invocation issues
  • Amazon ES domain health issues

Resolution

Invalid delivery destination

Confirm that you have specified a valid Kinesis Data Firehose delivery destination and that you are using the correct ARN. You can check whether your delivery was successful by viewing the DeliveryToElasticsearch.Success metric in Amazon CloudWatch. A DeliveryToElasticsearch.Success metric value of zero is confirmation that the deliveries are unsuccessful. For more information about the DeliveryToElasticsearch.Success metric, see Delivery to Amazon ES in Data delivery CloudWatch metrics.

No incoming data

Confirm that there is incoming data for Kinesis Data Firehose by monitoring the IncomingRecords and IncomingBytes metrics. A value of zero for those metrics means that there are no records reaching Kinesis Data Firehose. For more information about the IncomingRecords and IncomingBytes metrics, see Data ingestion through direct PUT in Data ingestion metrics.

If the delivery stream uses Amazon Kinesis Data Streams as a source, then check the IncomingRecords and IncomingBytes metrics of the Kinesis data stream. These two metrics indicate whether there is incoming data. A value of zero confirms that there are no records reaching the streaming services.

If there is data reaching Kinesis Data Streams, then the DataReadFromKinesisStream.Bytes and DataReadFromKinesisStream.Records metrics indicate whether data is coming from Kinesis Data Streams to Kinesis Data Firehose. For more information about the data metrics, see Data ingestion through Kinesis Data Streams. A value of zero can indicate a failure to deliver to Amazon ES rather than a failure between Kinesis Data Streams and Kinesis Data Firehose.

You can also check to see if the PutRecord and PutRecordBatch API calls for Kinesis Data Firehose are called properly. If you aren't seeing any incoming data flow metrics, check the producer that is performing the PUT operations. For more information about troubleshooting producer application issues, see Troubleshooting Amazon Kinesis Data Streams producers.

Disabled Kinesis Data Firehose logs

Be sure that Logging is enabled for Kinesis Data Firehose. Otherwise, the error logs result in a delivery failure. Then, check for the /aws/kinesisfirehose/delivery-stream-name log group name in CloudWatch Logs.

In the Kinesis Data Firehose role, the following permissions are required:

"Action":[
               "logs:PutLogEvents"
              ]
"Resource":[
                "arn:aws:logs:region:account-id:log-group:log-group-name:log-stream:log-stream-name"
                   ]

Verify that you have granted Kinesis Data Firehose access to a public Amazon ES destination. If you are using the data transformation feature, then you must also grant access to AWS Lambda.

Lack of proper permissions

There are several permissions required depending on the configuration of Kinesis Data Firehose.

To deliver records to an Amazon Simple Storage Service (Amazon S3) bucket, the following permissions are required:

{      
            "Effect": "Allow",      
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject"
            ],      
            "Resource": [        
                "arn:aws:s3:::bucket-name",
                "arn:aws:s3:::bucket-name/*"            
            ]    
        }

Note: To use this policy, the Amazon S3 bucket resource must be present.

If your Kinesis Data Firehose is encrypted at rest, the following permissions are required:

{
           "Effect": "Allow",
           "Action": [
               "kms:Decrypt",
               "kms:GenerateDataKey"
           ],
           "Resource": [
               "arn:aws:kms:region:account-id:key/key-id"           
           ],
           "Condition": {
               "StringEquals": {
                   "kms:ViaService": "s3.region.amazonaws.com"
               },
               "StringLike": {
                   "kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket-name/prefix*"
               }
           }
        }

To allow permissions for Amazon ES access, you can update your policy like this example:

 {
           "Effect": "Allow",
           "Action": [
               "es:DescribeElasticsearchDomain",
               "es:DescribeElasticsearchDomains",
               "es:DescribeElasticsearchDomainConfig",
               "es:ESHttpPost",
               "es:ESHttpPut"
           ],
          "Resource": [
              "arn:aws:es:region:account-id:domain/domain-name",
              "arn:aws:es:region:account-id:domain/domain-name/*"
          ]
       },
       {
          "Effect": "Allow",
          "Action": [
              "es:ESHttpGet"
          ],
          "Resource": [
              "arn:aws:es:region:account-id:domain/domain-name/_all/_settings",
              "arn:aws:es:region:account-id:domain/domain-name/_cluster/stats",
              "arn:aws:es:region:account-id:domain/domain-name/index-name*/_mapping/type-name",
              "arn:aws:es:region:account-id:domain/domain-name/_nodes",
              "arn:aws:es:region:account-id:domain/domain-name/_nodes/stats",
              "arn:aws:es:region:account-id:domain/domain-name/_nodes/*/stats",
              "arn:aws:es:region:account-id:domain/domain-name/_stats",
              "arn:aws:es:region:account-id:domain/domain-name/index-name*/_stats"
          ]
       }

If you are using Kinesis Data Streams as a source, update your permissions like this example:

{
          "Effect": "Allow",
          "Action": [
              "kinesis:DescribeStream",
              "kinesis:GetShardIterator",
              "kinesis:GetRecords",
              "kinesis:ListShards"
          ],
          "Resource": "arn:aws:kinesis:region:account-id:stream/stream-name"
       }

To configure Kinesis Data Firehose for data transformation, you can update your policy like this:

{
          "Effect": "Allow", 
          "Action": [
              "lambda:InvokeFunction", 
              "lambda:GetFunctionConfiguration" 
          ],
          "Resource": [
              "arn:aws:lambda:region:account-id:function:function-name:function-version"
          ]
       }

AWS Lambda function invocation issues

Check the Kinesis Data Firehose ExecuteProcessing.Success and Errors metrics to be sure that Kinesis Data Firehose has invoked your function. If Kinesis Data Firehose has not tried to invoke your Lambda function, then check the invocation time to see if it is beyond the timeout parameter. Your Lambda function might require a greater timeout value or need more memory to complete in time. For more information about invocation metrics, see Using invocation metrics.

To identify the reasons that Kinesis Data Firehose isn't invoking the Lambda function, check the Amazon CloudWatch Logs group for /aws/lambda/lamdba-function-name. If data transformation fails, then the failed records are delivered to the S3 bucket as a backup in the processing-failed folder. The records in your S3 bucket also contain the error message for failed invocation. For more information about resolving Lambda invocation failures, see Data transformation failure handling.

Amazon ES domain health issues

Check the following metrics to confirm that Amazon ES is in good health:

  • CPU utilization: If this metric is consistently high, the data node might be unable to respond to any requests or incoming data. You might need to scale your cluster.
  • JVM memory pressure: If the JVM memory pressure is consistently above 80%, the cluster might be triggering memory circuit breaker exceptions. These exceptions can prevent the data from being indexed.
  • ClusterWriteBlockException: This is an indexing block that occurs if the domain is under high JVM memory pressure or if more storage space is needed. If one data node has no space, then no new data can be indexed. For more information about troubleshooting Amazon ES issues, see Amazon Elasticsearch Service troubleshooting.

Did this article help you?

Anything we could improve?


Need more help?