AWS for Industries

FSI Service Spotlight: Amazon Managed Streaming for Kafka (MSK)

In this edition of the Financial Services Industry (FSI) Services Spotlight monthly blog series, we highlight five key considerations of Amazon MSK: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security respectively. Each of the five areas includes specific guidance, suggested reference architectures, and technical code that can help streamline service approval of Amazon MSK. These may need to be adapted to your business, compliance, and security requirements. This edition of the Service Spotlight will feature Amazon MSK: a fully managed, highly available Kafka service that ingests and processes streaming data in real-time.

Amazon MSK is an AWS streaming data service that manages Apache Kafka infrastructure and operations, making it easy for developers and DevOps managers to run Apache Kafka applications and Kafka Connect connectors on AWS without becoming experts in operating Apache Kafka. Amazon MSK operates, maintains, and scales Apache Kafka clusters and provides enterprise-grade security features. It offers built-in integrations with other AWS services that accelerate the development of streaming data applications. With Amazon MSK, there are no data transfer charges for in-cluster traffic, and there are no commitments or upfront payments required. You pay only for the resources you use.

Many Financial Services Institutions worldwide are leveraging Amazon MSK to build real-time streaming applications. For example, Goldman Sachs has used MSK to build its Global Transaction Banking Platform. This helped them provide their clients and partners an easy access that is nimble, secure to connect to the platform.

Coinbase ingests billions of events daily from user, application, and crypto sources across their products. Coinbase leverages Amazon MSK for ultra low latency, seamless service-to-service communication, data ETLs, and database Change Data Capture (CDC). With Amazon MSK, Coinbase has mitigated the day-to-day Kafka operational overhead of broker maintenance and recovery, allowing them to concentrate in engineering time on core business demands.

Achieving Compliance with Amazon MSK

Security is a shared responsibility between AWS and the customer. AWS is responsible for protecting the infrastructure that runs all of the services offered in the AWS Cloud. At the same time, a customer is responsible for security in the cloud. This responsibility changes depending on the particulars of the cloud service that the customer is using. When using Amazon MSK, the customer’s responsibility is determined by the sensitivity of the data, their organization’s compliance and security objectives, and applicable laws and regulations.

Amazon MSK falls under the scope of the following compliance programs concerning AWS’s side of the shared responsibility model. The related documents are available on demand under an AWS non-disclosure agreement (NDA) through AWS Artifact.

SOC 1,2,3
PCI
ISMAP
ISO/IEC 27001:2013, 27017:2015, 27018:2019, 27701:2019, ISO/IEC 9001:2015 and CSA STAR CCM v3.0.1
HIPAA BAA
IRAP
MTCS (Specific regions)
C5
K-ISMS
ENS High
OSPAR
HITRUST CSF
FINMA
GSMA (Specific regions)
PiTuKri

In the following sections, we will cover topics on the customer side of the shared responsibility model.

Data Protection with Amazon MSK

Encryption At-Rest

Amazon MSK provides encryption features for data at rest and for data in transit. Amazon MSK cluster uses Amazon EBS server-side encryption and AWS Key Management System (KMS) keys to encrypt storage volumes for data at rest encryption. You also can specify a customer-managed CMK to encrypt your data at rest when you are creating new clusters. If you don’t specify a CMK, Amazon MSK still encrypts the data at rest but under an AWS-managed CMK.

encrypt data at rest

Encryption In-Transit

We highly recommend enabling in-transit encryption even though it can add additional CPU overhead and a few milliseconds of latency. MSK in-transit encryption consists of two parts.

  1. Communication between the client and broker

The encryption settings for communication between the client and broker depend on the chosen access control method. Various access control methods currently supported are:

    1. Unauthenticated access (i.e., all actions are allowed)
    2. IAM role-based access (this also supports authorization of Kafka actions)
    3. SASL/SCRAM (Need to additionally support Apache Kafka ACLs for authorization of Kafka actions)
    4. Mutual TLS (Need to additionally support Apache Kafka ACLs for authorization of Kafka actions)

Depending on the access control method used, two options are available for encryption between the client and the broker.

    • TLS Encryption
      • Required for IAM, SASL/SCRAM, and TLS access control methods
      • Enabled by default for all access control methods
    • Plaintext
      • Plaintext traffic is not possible with TLS, SASL/SCRAM, and IAM access control methods
      • Disabled by default for all access control methods
  1. Inter-broker communication (cannot be changed after creating the cluster) –
    • TLS Encryption
      • Enabled default for all access control methods
      • Cannot be disabled when any of IAM, TLS, or SASL/SCRAM access control methods are enabled. Conversely, it can be disabled when unauthenticated access is the only option enabled.

access control methods

Alternatively, suppose you are using the CLI to create the cluster. In that case, you can follow this document which provides examples of setting the different encryption settings in JSON format.

code

Isolation of compute environments of an Amazon MSK cluster

  1. Zookeeper node access: Kafka uses ZooKeeper to manage the cluster, including controller election (i.e., leader/follower relationship for all partitions), cluster membership (list of all functioning brokers in the cluster), topic configuration (list of topics, partitions, location of replicas etc.) and ACL maintenance (who or what is allowed to write to the topic).

Access to Apache ZooKeeper nodes that are part of your Amazon MSK cluster can be limited by assigning a separate security group to the ZooKeeper nodes. You can also enable TLS security for encryption in transit between the clients and the Apache ZooKeeper nodes by following the instructions in that link.

  1. Broker access: When you create an MSK cluster, brokers are only accessible from inside the cluster’s VPC by default, and public access is disabled. When creating the cluster, you can specify the VPC, Availability Zone, and subnets where you want Amazon MSK service to deploy the brokers. The access to your brokers will be limited to resources within that VPC by default.

2.1. Access from within the clusters VPC: To connect to your MSK cluster from a client in the same VPC as the cluster, make sure the cluster’s security group has an inbound rule that accepts traffic from the client’s security group.

2.2. Access from outside the clusters VPC: Amazon MSK cluster has the option of connecting either from the outside of cluster’s VPC within the AWS environment, or you can make access to the MSK cluster be publicly accessible.

2.2.1 Public Access: This option is very rarely turned on, if at all, in any FSI organization. However, there is an option to turn on public access to the brokers of MSK clusters running Apache Kafka 2.6.0 or later versions. To turn on public access to a cluster, ensure that the cluster meets all of the conditions. Public access can be enabled both from the console or using the AWS CLI as described in that link.

Public access can be blocked using IAM policy, restricting access to kafka:UpdateConnectivity

edit public access for test

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Deny",
            "Action": "kafka:UpdateConnectivity",
            "Resource": "*"
        }
    ]
}

Security settings

Restricts both encryption at rest and encryption in transit

edit security settings

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Deny",
            "Action": "kafka:UpdateSecurity",
            "Resource": "*"
        }
    ]
}

2.2.1  Access from within AWS but outside cluster’s VPC: Kafka clients trying to access an Amazon MSK cluster from inside AWS but outside the cluster’s have several access options:

        1. Amazon VPC peering
        2. AWS Direct Connect
        3. AWS Transit Gateway
        4. VPN Connections
        5. REST Proxies
        6. Multiple Region Multi-VPC Connectivity

This allow for resources in either VPC to securely communicate with each other.

Automating audits with APIs

Financial institutions are often required to audit their AWS services for usage, user activities, and any resource changes as part of their standard IT security and compliance policies. You can use AWS CloudTrail to log all API calls to the MSK service and Amazon CloudWatch to log all broker logs.

Third-party auditors assess the security and compliance of Amazon Managed Streaming for Apache Kafka as part of AWS compliance programs. These include PCI and HIPAA BAA. AWS Security Hub provides a comprehensive view of your security state within AWS that helps to check your compliance with security industry standards and best practices.

AWS MSK service logs

Amazon MSK integrates with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon MSK. CloudTrail captures API calls for all events, including calls from the Amazon MSK console and code calls to Amazon MSK API operations. Amazon MSK also logs all Amazon MSK operations as events in CloudTrail log files. Using the information collected by CloudTrail, can determine what the request was, the IP address from which the request was made, who made the request and when it was made.

AWS CloudTrail also captures Apache Kafka’s actions by creating and altering topics and groups. The actions that are logged to CloudTrail are:

  • kafka-cluster:DescribeClusterDynamicConfiguration
  • kafka-cluster:AlterClusterDynamicConfiguration
  • kafka-cluster:CreateTopic
  • kafka-cluster:DescribeTopicDynamicConfiguration
  • kafka-cluster:AlterTopic
  • kafka-cluster:AlterTopicDynamicConfiguration
  • kafka-cluster:DeleteTopic

Apache Kafka broker logs

Apache Kafka broker logs can be delivered to one of 1/ Amazon CloudWatch Logs, 2/ Amazon S3, or 3/ Amazon Kinesis Data Firehose. Kafka broker logs could enable to troubleshoot the client applications and analyze their communications with MSK cluster.

Operational access and security

Two aspects of Amazon MSK cluster need managing – 1/ Authentication and Authorization for Amazon MSK APIs, and 2/ Authentication and Authorization for Apache Kafka APIs.

Authentication and Authorization for Amazon MSK APIs

AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely control access to the Amazon MSK service. With IAM identity-based policies, you can specify allowed or denied actions and resources and the conditions under which actions are allowed or denied, and Amazon MSK supports specific actions, resources, and condition keys. Administrators can use AWS JSON policies to specify who has access to what. That is, which principal can perform actions on what resources and under what conditions.

The Action element of a JSON policy describes the actions you can use to allow or deny access to a policy. Policy actions in Amazon MSK use the prefix “kafka:” before the action. For example, to grant someone permission to describe an MSK cluster with the Amazon MSK DescribeCluster API operation, include the kafka:DescribeCluster action in their policy. You can also specify multiple actions using wildcards (*) or multiple separate actions using commas.

The Resource JSON policy element specifies the object or objects the action applies. Statements must include either a Resource or a NotResource element. Specify a resource using its Amazon Resource Name (ARN), as a best practice. Some Amazon MSK actions, such as those for creating resources, cannot be performed on a specific resource. You must use a wildcard (*) to support the action in those cases.

The Condition element lets you specify the conditions in which a statement is in effect. The condition element is optional.

Example policy below shows how to create a policy that allows the user to – describe the cluster, get the bootstrap brokers, list broker nodes, update and delete the cluster. However, permission is only granted if the cluster tag “Owner” has the value of the user name.

(
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AccessClusterIfOwner",
      "Effect": "Allow",
      "Action": [
        "kafka:Describe*",
        "kafka:Get*",
        "kafka:List*",
        "kafka:Update*",
        "kafka:Delete*"
      ],
      "Resource": "arn:aws:kafka:us-east-1:012345678012:cluster/*",
      "Condition": {
        "StringEquals": {
          "kafka:ResourceTag/Owner": "${aws:username}"
        }
      }
    }
  ]
}

Authentication and Authorization for Apache Kafka APIs

You can use IAM to authenticate clients and allow or deny Apache Kafka actions. Alternatively, you can use TLS or SASL / SCRAM to authenticate clients and Apache Kafka ACLs to allow or deny actions. Since IAM access control for Amazon MSK and Apache Kafka actions enables you to handle both authentication and authorization, it eliminates the need to use one mechanism for authentication and another for authorization.

To use IAM access control for Amazon MSK, perform the following steps:

  1. Create cluster that uses IAM access control
  2. Configure clients for IAM access control
  3. Create authorization policies
  4. Get the bootstrap brokers for IAM access control.

Following is an example authorization policy for a cluster named MyTestCluster.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:Connect",
                "kafka-cluster:AlterCluster",
                "kafka-cluster:DescribeCluster"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:0123456789012:cluster/MyTestCluster/abcd1234-0123-abcd-5678-1234abcd-1"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:*Topic*",
                "kafka-cluster:WriteData",
                "kafka-cluster:ReadData"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:AlterGroup",
                "kafka-cluster:DescribeGroup"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:0123456789012:group/MyTestCluster/*"
            ]
        }
    ]
}

Using SCP to manage IAM permissions within your AWS Organization.

Service control policies (SCPs) are a type of organization policy that can be used to manage permissions in your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization. SCPs help you to ensure your accounts stay within your organization’s access control guidelines.

Sample policy to deny WriteDataIdempotently for an MSK cluster named “test”:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "Statement1",
			"Effect": "Deny",
			"Action": [
				"kafka-cluster:WriteDataIdempotently"
			],
			"Resource": [
				"arn:aws:kafka:us-east-1:457374075027:cluster/test/a11f1e5e-6d59-4c9e-b8c9-b2d2833f6b6d-11"
			]
		}
	]
}

Conclusion

In this post, we reviewed Amazon MSK and highlighted key information that can help FSI customers accelerate the approval of the service within these five categories: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security. We encourage customers to use the controls highlighted in this blog as appropriate in accordance with their business needs and AWS environment. While not a one-size-fits-all approach, the guidance can be adapted to meet your organization’s security and compliance requirements and provide a consolidated list of key areas for Amazon MSK.

In the meantime, visit our AWS Financial Services Industry blog channel and stay tuned for more financial services news and best practices.

Sudhir Kalidindi

Sudhir Kalidindi

Sudhir Kalidindi is an AWS Principal Solutions Architect in Financial Services with 22+ years of experience in software architecture and the development of solutions involving business and critical workloads. He helps payments customers to innovate on the AWS Cloud by providing solutions using AWS products and services.

Anu Jayanthi

Anu Jayanthi

Anu Jayanthi is an AWS Solutions Architect. She works with Startup customers, providing advocacy and strategic technical guidance to help plan and build solutions using AWS best practices.

Pradeep Dhananjaya

Pradeep Dhananjaya

Pradeep Dhananjaya is a Banking Specialist Solutions Architect in the Worldwide Financial Services industry group at AWS. He spends much of his time working with fintechs and traditional banks solving for their business problems with technology. Prior to joining AWS, Pradeep spent more than a decade building technology solutions at JP Morgan Chase and Morgan Stanley.