How do I set up cross-account streaming from Kinesis Data Firehose to Amazon Elasticsearch Service?

Last updated: 2020-04-15

I want to set up an Amazon Kinesis Data Firehose stream that sends data to an Amazon Elasticsearch Service (Amazon ES) cluster in another account. How do I stream my data resources across different accounts? 

Short Description

You can set up Kinesis Data Firehose and its dependencies, such as Amazon Simple Storage Service (Amazon S3) and Amazon CloudWatch, to stream across different accounts. Streaming data delivery works only if the Amazon ES cluster is publicly accessible and has Node-to-node encryption disabled. To turn off the Node-to-node encryption feature, the Enable fine-grained access control setting must be deselected.

To set up a Data Firehose stream so that it sends data to an Amazon ES cluster, perform the following steps:

1.    Create an Amazon S3 bucket in Account A.

2.    Create a CloudWatch Log Group and Log Stream in Account A.

3.    Create a Data Firehose role and a policy in Account A.

4.    Create an Elasticsearch cluster in Account B and apply a policy that allows data from the Data Firehose Role in Account A.

5.    Amend the policy in the Data Firehose role in Account A so that it can send data to the Elasticsearch Cluster in Account B.

6.    Create the Data Firehose in Account A.

7.    Test the cross-account streams.

Resolution

Create an Amazon S3 bucket in Account A

Create an S3 bucket in Account A. The S3 bucket generates an Amazon Resource Name (ARN).

Note: The complete ARN is used later to grant Data Firehose access to save and retrieve records from this S3 bucket.

Create a CloudWatch Log Group and Log Stream in Account A

To create a CloudWatch Log Group, perform the following steps:

1.    Open the CloudWatch console.

2.    In the navigation pane, choose Log groups.

3.    Choose Actions.

4.    Choose Create log group.

5.    Enter a Log Group name.

6.    Choose the Create log group button to save your new log group.

7.    Search for your newly created log group and then select it. The completion of this task verifies that you can now create a log stream.

To create a CloudWatch Log Stream, perform the following steps:

1.    Choose Create Log Stream.

2.    Enter a Log Stream Name.

3.    Choose Create Log Stream. This action saves your newly created log stream.

Important: The CloudWatch Log Group and CloudWatch Log Stream names are required when creating Data Firehose role policies.

Create a Data Firehose role and a policy in Account A

1.    Navigate to the AWS Identity and Access Management (IAM) console.

2.    Create an IAM policy that allows Data Firehose to save stream logs to CloudWatch, records to S3, and data streams to the Elasticsearch cluster:

{
    "Version": "2012-10-17",
    "Statement": [
        {          
            "Effect": "Allow",    
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
    "Resource": [
                "<Bucket ARN>",
                "<Bucket ARN>/*"    
            ]
        },
        {    
            "Effect": "Allow",
                "Action": [
                "logs:PutLogEvents"
            ],           
                "Resource": [
                "arn:aws:logs:<region>:<account-id>:log-group:<log-group-name>:log-stream:<log-stream-name>"           
            ]
        }
    ]
}

Note: You append the proper permissions later to stream the Elasticsearch cluster policy because the cluster in Account B must be created first.

3.    Save the policy.

4.    Choose Create a role.

5.    Add the newly created policy to your Kinesis Data Firehose role.

Create an Elasticsearch cluster in Account B and apply a policy that allows data from the Data Firehose Role in Account A

Create an Elasticsearch cluster that has the Publicly Accessible setting enabled. Also, make sure that the Node-to-Node encryption setting is disabled for Account B. To disable Node-to-Node encryption, the Enable fine-grained access control setting must be deselected.

Important: You must configure the security settings to allow your role to stream through Data Firehose.

In Access policy, select Custom access policy and select IAM, Allow, and enter the User ARN from your IAM console.

After the cluster is created, the Amazon ES Domain ARN should appear. Navigate to Actions to change the policy of the cluster. Then, modify your Manage access policy to the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<Firehose Role ARN in Account A>"
      },
      "Action": [
        "es:ESHttpPost",
        "es:ESHttpPut"
      ],
      "Resource": [
        "<ES Domain ARN in Account B>",
        "<ES Domain ARN in Account B>/*"
      ]
    },
    {    
      "Effect": "Allow",
      "Principal": {
        "AWS": "<Firehose Role ARN in Account A>"
      },
      "Action": "es:ESHttpGet",
      "Resource": [
        "<ES Domain ARN in Account B>/_all/_settings",
        "<ES Domain ARN in Account B>/_cluster/stats",
        "<ES Domain ARN in Account B>/index-name*/_mapping/type-name",
        "<ES Domain ARN in Account B>/roletest*/_mapping/roletest",
        "<ES Domain ARN in Account B>/_nodes",
        "<ES Domain ARN in Account B>/_nodes/stats",
        "<ES Domain ARN in Account B>/_nodes/*/stats",
        "<ES Domain ARN in Account B>/_stats",
        "<ES Domain ARN in Account B>/index-name*/_stats",
        "<ES Domain ARN in Account B>/roletest*/_stats"
      ]
    }
  ]
}

For more information about permissions within the Elasticsearch policy, see Cross-Account Delivery to an Amazon ES Destination.

Amend the policy in the Data Firehose role in Account A so that it can send data to the Elasticsearch Cluster in Account B

Amend the Data Firehose policy so that it is able to send data to the Elasticsearch cluster:

{
    "Version": "2012-10-17",
    "Statement": [
        {          
            "Effect": "Allow",    
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "<Bucket ARN>",
                "<Bucket ARN>/*"    
            ]
        },
        {    
            "Effect": "Allow",
                "Action": [
                "logs:PutLogEvents"
            ],           
                "Resource": [
                "arn:aws:logs:<region>:<account-id>:log-group:<log-group-name>:log-stream:<log-stream-name>"
            ]
        },    
         {
            "Effect": "Allow",
            "Action": [
                "es:ESHttpPost",
                "es:ESHttpPut",
                "es:DescribeElasticsearchDomain",    
                "es:DescribeElasticsearchDomains",
                "es:DescribeElasticsearchDomainConfig"
            ],
            "Resource": [
                "<ES Domain ARN in Account B>",
                "<ES Domain ARN in Account B>/*"
            ]
        },
        {        
            "Effect": "Allow",
            "Action": [
                "es:ESHttpGet"
            ],
            "Resource": [
                "<ES Domain ARN in Account B>/_all/_settings",
                "<ES Domain ARN in Account B>/_cluster/stats",
                "<ES Domain ARN in Account B>/index-name*/_mapping/superstore",
                "<ES Domain ARN in Account B>/_nodes",
                "<ES Domain ARN in Account B>/_nodes/stats",
                "<ES Domain ARN in Account B>/_nodes/*/stats",
                "<ES Domain ARN in Account B>/_stats",
                "<ES Domain ARN in Account B>/index-name*/_stats"
            ]
        }
    ]
}

For more information about delivering Data Firehose data to your Elasticsearch cluster, see Grant Kinesis Data Firehose Access to an Amazon ES Destination.

Create the Data Firehose in Account A

To create a Data Firehose with a cross-account Elasticsearch cluster, use and configure the AWS CLI.

Make sure that AWS CLI is up-to-date using the following command:

aws --version

After the AWS CLI is updated, create a file called input.json with the following content:

{
    "DeliveryStreamName": "<Firehose name>",
    "DeliveryStreamType": "DirectPut",
    "ElasticsearchDestinationConfiguration": {
        "RoleARN": "<Firehose Role ARN of Account A>",
        "ClusterEndpoint": "<ES Domain cluster Endpoint of Account B>",
        "IndexName": "local",
        "TypeName": "TypeName",
        "IndexRotationPeriod": "OneDay",
        "BufferingHints": {
            "IntervalInSeconds": 60,
            "SizeInMBs": 50
        },
        "RetryOptions": {
            "DurationInSeconds": 60
        },
        "S3BackupMode": "FailedDocumentsOnly",
        "S3Configuration": {
            "RoleARN": "<Firehose Role ARN of Account A>",
            "BucketARN": "<S3 Bucket ARN of Account A>",
            "Prefix": "",
            "BufferingHints": {
                "SizeInMBs": 128,
                "IntervalInSeconds": 128
            },
            "CompressionFormat": "UNCOMPRESSED"
        },
        "CloudWatchLoggingOptions": {
            "Enabled": true,
            "LogGroupName": "<Log group name>",
            "LogStreamName": "<Log stream name>"
        }
    }
}

Make sure that the endpoint value is correctly entered in the ClusterEndpoint attribute field.

Note: Types are deprecated in Elasticsearch version 7.x. For Elasticsearch versions 7.x, make sure to remove the TypeName attribute.

Then, run the following AWS CLI command in the same directory as the location of the input.json file:

aws firehose create-delivery-stream --cli-input-json file://input.json

This creates a Data Firehose in Account A and an Elasticsearch cluster in Account B.

Test the cross-account streams

Use the Kinesis Data Generator to stream records into the Data Firehose in Account A.

The Kinesis Data Generator generates many records per second. This allows Amazon ES to have enough data points to determine the correct mapping of the record structure.

Here is the template structure used in the Kinesis Data Generator:

{
    "device_id": {{random.number(5)}},
    "device_owner": "{{name.firstName}}  {{name.lastName}}",
    "temperature": {{random.number(
        {
            "min":10,
            "max":150
        }
    )}},
    "timestamp": "{{date.now("DD/MMM/YYYY:HH:mm:ss Z")}}"
}

To verify if cross-account streaming was successful, check if there is a local index named "local" under the Indices tab in the Elasticsearch cluster.

Note: It can take a few minutes for Amazon ES to determine the correct mapping.