How do I set up cross-account streaming from Kinesis Data Firehose to Amazon OpenSearch Service?

Last updated: 2021-07-23

I want to set up an Amazon Kinesis Data Firehose stream that sends data to an Amazon OpenSearch Service cluster in another account. How do I stream my data resources across different accounts?

Short description

You can set up Kinesis Data Firehose and its dependencies, like Amazon Simple Storage Service (Amazon S3) and Amazon CloudWatch, to stream across different accounts. Streaming data delivery works for publicly accessible OpenSearch Service clusters whether or not fine-grained access control (FGAC) is enabled. This article covers both use cases.

Note: Amazon OpenSearch Service is the successor to Amazon Elasticsearch Service.

To set up a Kinesis Data Firehose stream so that it sends data to an OpenSearch Service cluster, perform the following steps:

1.    Create an Amazon S3 bucket in Account A.

2.    Create a CloudWatch log group and log stream in Account A.

3.    Create a Kinesis Data Firehose role and policy in Account A.

4.    Create a publicly accessible OpenSearch Service cluster in Account B to which the Kinesis Data Firehose role in Account A will stream data.

5.    (Optional) If fine-grained access control (FGAC) is enabled, log in to OpenSearch Dashboards and add a role mapping.

Note: OpenSearch Dashboards is the successor to Kibana.

6.    Update the AWS Identity Access Management (IAM) role policy for your Kinesis Data Firehose role in Account A to send data to Account B.

7.    Create the Kinesis Data Firehose stream in Account A.

8.    Test cross-account streaming to the OpenSearch Service cluster.

Resolution

Create an Amazon S3 bucket in Account A

Create an S3 bucket in Account A. The Amazon S3 bucket generates an Amazon Resource Name (ARN).

Note: The complete ARN is used later to grant Kinesis Data Firehose access to save and retrieve records from the Amazon S3 bucket.

Create a CloudWatch Log Group and Log Stream in Account A

To create a CloudWatch Log Group, perform the following steps:

1.    Open the CloudWatch console.

2.    In the navigation pane, choose Log groups.

3.    Choose Create log group.

4.    Enter a Log Group name.

5.    Choose the Create log group button to save your new log group.

6.    Search for your newly created log group, and then select it. The completion of this task verifies that you can now create a log stream.

To create an Amazon CloudWatch log stream, perform the following steps:

1.    Choose Create Log Stream.

2.    Enter a Log Stream Name.

3.    Choose Create Log Stream. This action saves your newly created log stream.

Important: The CloudWatch log group and CloudWatch log stream names are required when creating Kinesis Data Firehose role policies.

Create a Kinesis Data Firehose role and a policy in Account A

1.    Navigate to the AWS Identity and Access Management (IAM) console.

2.    Create an IAM policy that allows Kinesis Data Firehose to do the following:
Saves stream logs to CloudWatch
Records to Amazon S3
Streams data to the OpenSearch Service cluster

For example:

{
    "Version": "2012-10-17",
    "Statement": [
        {          
            "Effect": "Allow",    
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
    "Resource": [
                "<Bucket ARN>",
                "<Bucket ARN>/*"    
            ]
        },
        {    
            "Effect": "Allow",
                "Action": [
                "logs:PutLogEvents"
            ],           
                "Resource": [
                "arn:aws:logs:<region>:<account-id>:log-group:/aws/kinesisfirehose/<Firehose Name>:log-stream:*"           
            ]
        }
    ]
}

Note: You'll append permissions to stream to the OpenSearch Service cluster policy later on. However, you must first create the cluster in Account B.

3.    Save the policy.

4.    Choose Create a role.

5.    Add the newly created policy to your Kinesis Data Firehose role.

Create a publicly accessible OpenSearch Service cluster in Account B to which the Kinesis Data Firehose role in Account A will stream data

1.    Create your publicly accessible OpenSearch Service cluster in Account B.

2.    Record the OpenSearch Service domain ARN. You'll need the ARN for a later step.

3.    Configure your security settings for your cluster.

Important: You must configure your OpenSearch Service security settings to allow the Kinesis Data Firehose role in Account A to stream to your OpenSearch Service cluster.

To configure your security settings, perform the following steps:

1.    In OpenSearch Service, navigate to Access policy.

2.    Select the JSON defined access policy. Your policy must have the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "<ES Domain ARN in Account B>/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "<Your IP Address for OpenSearch Dashboards access>"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<Firehose Role ARN in Account A>"
      },
      "Action": [
        "es:ESHttpPost",
        "es:ESHttpPut"
      ],
      "Resource": [
        "<ES Domain ARN in Account B>",
        "<ES Domain ARN in Account B>/*"
      ]
    },
    {    
      "Effect": "Allow",
      "Principal": {
        "AWS": "<Firehose Role ARN in Account A>"
      },
      "Action": "es:ESHttpGet",
      "Resource": [
        "<ES Domain ARN in Account B>/_all/_settings",
        "<ES Domain ARN in Account B>/_cluster/stats",
        "<ES Domain ARN in Account B>/index-name*/_mapping/type-name",
        "<ES Domain ARN in Account B>/roletest*/_mapping/roletest",
        "<ES Domain ARN in Account B>/_nodes",
        "<ES Domain ARN in Account B>/_nodes/stats",
        "<ES Domain ARN in Account B>/_nodes/*/stats",
        "<ES Domain ARN in Account B>/_stats",
        "<ES Domain ARN in Account B>/index-name*/_stats",
        "<ES Domain ARN in Account B>/roletest*/_stats"
      ]
    }
  ]
}

For more information about permissions within the OpenSearch Service policy, see Cross-account delivery to an OpenSearch Service destination.

3.    (Optional) If fine-grained access control (FGAC) is enabled on your cluster, log in to OpenSearch Dashboards and add a role mapping. The roll mapping will allow the Kinesis Data Firehose role to send requests to OpenSearch Service.

(Optional) If fine-grained access control (FGAC) is enabled, log in to OpenSearch Dashboards and add a role mapping

If the OpenSearch Service cluster has fine-grained access control enabled, you must login to OpenSearch Dashboards and add a role mapping to the Kinesis Data Firehose role. The role mapping provides the Kinesis Data Firehose role access to stream to the OpenSearch Service cluster.

To log in to OpenSearch Dashboards and add a role mapping, perform the following steps:

1.    Open Dashboards.

2.    Choose the Security tab.

3.    Choose Roles.

4.    Choose the all_access role.

5.    Choose the Mapped users tab.

6.    Choose Manage mapping.

7.    In the Backend roles section, enter the Kinesis Data Firehose role.

8.    Choose Map.

Update the IAM role policy for your Kinesis Data Firehose role in Account A to send data to Account B

To send data from your Kinesis Data Firehose role in Account A to your OpenSearch Service cluster in Account B, update the Kinesis Data Firehose policy like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {          
            "Effect": "Allow",    
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "<Bucket ARN>",
                "<Bucket ARN>/*"    
            ]
        },
        {    
            "Effect": "Allow",
                "Action": [
                "logs:PutLogEvents"
            ],           
                "Resource": [
                "arn:aws:logs:<region>:<account-id>:log-group:/aws/kinesisfirehose/<Firehose Name>:log-stream:*"
            ]
        },    
         {
            "Effect": "Allow",
            "Action": [
                "es:ESHttpPost",
                "es:ESHttpPut",
                "es:DescribeDomain",    
                "es:DescribeDomains",
                "es:DescribeDomainConfig"
            ],
            "Resource": [
                "<Domain ARN in Account B>",
                "<Domain ARN in Account B>/*"
            ]
        },
        {        
            "Effect": "Allow",
            "Action": [
                "es:ESHttpGet"
            ],
            "Resource": [
                "<Domain ARN in Account B>/_all/_settings",
                "<Domain ARN in Account B>/_cluster/stats",
                "<Domain ARN in Account B>/index-name*/_mapping/superstore",
                "<Domain ARN in Account B>/_nodes",
                "<Domain ARN in Account B>/_nodes/stats",
                "<Domain ARN in Account B>/_nodes/*/stats",
                "<Domain ARN in Account B>/_stats",
                "<Domain ARN in Account B>/index-name*/_stats"
            ]
        }
    ]
}

For more information about sending Kinesis Data Firehose data to your OpenSearch Service cluster, see Grant Kinesis Data Firehose access to an Amazon OpenSearch Service destination.

Create the Kinesis Data Firehose stream in Account A

To create a Kinesis Data Firehose stream with cross-account access to an OpenSearch Service cluster, use and configure the AWS Command Line Interface (AWS CLI).

Check to make sure that your AWS CLI is up-to-date:

aws --version

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent AWS CLI version.

After the AWS CLI is updated, create a file called input.json with the following content:

{
    "DeliveryStreamName": "<Firehose Name>",
    "DeliveryStreamType": "DirectPut",
    "ElasticsearchDestinationConfiguration": {
        "RoleARN": "",
        "ClusterEndpoint": "",
        "IndexName": "local",
        "TypeName": "TypeName",
        "IndexRotationPeriod": "OneDay",
        "BufferingHints": {
            "IntervalInSeconds": 60,
            "SizeInMBs": 50
        },
        "RetryOptions": {
            "DurationInSeconds": 60
        },
        "S3BackupMode": "FailedDocumentsOnly",
        "S3Configuration": {
            "RoleARN": "",
            "BucketARN": "",
            "Prefix": "",
            "BufferingHints": {
                "SizeInMBs": 128,
                "IntervalInSeconds": 128
            },
            "CompressionFormat": "UNCOMPRESSED",
            "CloudWatchLoggingOptions": {
                "Enabled": true,
                "LogGroupName": "/aws/kinesisfirehose/<Firehose Name>",
                "LogStreamName": "S3Delivery"
            }
        },
        "CloudWatchLoggingOptions": {
            "Enabled": true,
            "LogGroupName": "/aws/kinesisfirehose/<Firehose Name>",
            "LogStreamName": "ElasticsearchDelivery"
        }
    }
}

Make sure that the endpoint value is correctly entered in the ClusterEndpoint attribute field.

Note: Types are deprecated in Elasticsearch version 7.x. For Elasticsearch versions 7.x, make sure to remove the TypeName attribute from the input.json file.

Then, run the following CLI command in the same directory as the location of the input.json file:

aws firehose create-delivery-stream --cli-input-json file://input.json

This command syntax creates a Kinesis Data Firehose stream in Account A with a destination to an OpenSearch Service cluster in Account B.

Test cross-account streaming to the OpenSearch Service cluster

Use the Kinesis Data Generator to stream records into the Kinesis Data Firehose stream in Account A.

The Kinesis Data Generator (KDG) generates many records per second. This productivity level allows OpenSearch Service to have enough data points to determine the correct mapping of a record structure.

Here is the template structure used in the Kinesis Data Generator:

{
    "device_id": {{random.number(5)}},
    "device_owner": "{{name.firstName}}  {{name.lastName}}",
    "temperature": {{random.number(
        {
            "min":10,
            "max":150
        }
    )}},
    "timestamp": "{{date.now("DD/MMM/YYYY:HH:mm:ss Z")}}"
}

To verify whether cross-account streaming was successful, review the index entries under the Indices tab of your cluster. Check whether there is an index name using the prefix "local" with the current date. You can also check if the records are present in OpenSearch Dashboards.

Note: It can take a few minutes for OpenSearch Service to determine the correct mapping.