Amazon Managed Streaming for Apache Kafka Documentation

Amazon Managed Streaming for Apache Kafka (Amazon MSK) offers managed Apache Kafka. This means Amazon MSK provisions your servers, helps to configure your Apache Kafka clusters, replaces servers when they fail, orchestrates server patches and upgrades, architects clusters for availability, helps ensure data is durably stored and secured, sets up monitoring and alarms, and runs scaling to support load changes. With a managed service, you can spend your time developing and running streaming event applications. Amazon MSK provides open-source Apache Kafka clusters distributed across multiple Availability Zones (AZs), giving you streaming storage designed for security and availability. Amazon MSK is configurable, observable, and scalable, allowing for flexibility and control. Application development is simpler with Amazon MSK because of integrations with other AWS services. Amazon MSK integrates with AWS Identity and Access Management (IAM) and AWS Certificate Manager for security, AWS Glue Schema Registry for schema governance, Amazon Managed Service for Apache Flink and AWS Lambda for stream processing, and more. Amazon MSK provides the integration backbone for modern messaging and event-driven applications at the center of data ingest and processing services, as well as microservice application architectures.

No servers to manage

Fully managed - With a few clicks in the console, you can create a managed Apache Kafka cluster that is designed to follow Apache Kafka’s deployment best practices, or you can create your own cluster using your own custom configuration. Once you create your desired configuration, Amazon MSK is designed to provision, configure, and manage the operations of your Apache Kafka cluster and Apache ZooKeeper nodes.

Apache ZooKeeper included - Apache ZooKeeper is required to run Apache Kafka, coordinate cluster tasks, and maintain state for resources interacting with the cluster. Amazon MSK is designed to manage the Apache ZooKeeper nodes for you. Each Amazon MSK cluster is designed to include the appropriate number of Apache ZooKeeper nodes for your Apache Kafka cluster.

Amazon MSK Serverless - MSK Serverless is a cluster type for Amazon MSK that supports running Apache Kafka clusters without having to manage compute and storage capacity. MSK Serverless provisions and scales resources while also managing Apache Kafka partitions.

Availability

Availability - The service is designed so that all clusters are provisioned across multiple availability zones (three availability zones is default), and are supported by systems that seek to detect and respond to issues within cluster infrastructure and Apache Kafka software. If a component fails, Amazon MSK is designed to replace it without downtime to your applications. Amazon MSK assists in the management of the availability of your Apache ZooKeeper nodes so you don’t need to start, stop, or directly access the nodes yourself. Amazon MSK is designed to deploy software patches as needed to help you keep your cluster up-to-date and running smoothly.

Data replication - Amazon MSK is designed to use multi-AZ replication for high-availability.

Security

Private connectivity - Your Apache Kafka clusters are designed to run in an Amazon VPC managed by Amazon MSK. Your clusters are designed to be available to your own Amazon VPCs, subnets, and security groups based on the configuration you specify. The service is designed so that you can control your network configuration and IP addresses from your VPCs that are attached to your Amazon MSK resources through elastic network interfaces (ENIs).

Granular access control - By using IAM Access Control, you no longer need to build and run one-off access management systems to control client authentication and authorization for Apache Kafka and your clusters are secured using least-privileged permissions. You also can use SASL/SCRAM or mutual TLS authentication with Apache Kafka access control lists (ACLs).

Encryption - Amazon MSK is designed to encrypt your data at rest without special configuration or third-party tools. Data can be encrypted at rest using AWS Key Management Service (KMS) Customer Master Key (CMK), or your own CMK. Amazon MSK is also designed to encrypt data in-transit via TLS between brokers and between clients and brokers on your cluster.

Connectivity over the internet - Amazon MSK offers an option to connect to the brokers of Amazon MSK clusters running Apache Kafka 2.6.0 or later versions over the internet. By enabling Public Access, authorized clients external to a private Amazon Virtual Private Cloud (VPC) can stream encrypted data in and out of specific Amazon MSK clusters.

Cross-Account Access Control - Use a cluster policy for your Amazon MSK cluster to define which cross-account IAM principals have permissions to set up cross-account private connectivity to your Amazon MSK cluster. When used with IAM client authentication, you can also use the cluster policy to granularly define Kafka data plane permissions for connecting clients.

Graviton

AWS Graviton3 processors are the latest generation of custom-designed AWS Graviton processors built on the AWS Nitro System. The Graviton3 processors based M7g instances deliver higher storage throughput and increased network throughput compared to similar sized M5 instances at a lower cost.

Open Source

Run with native Apache Kafka - Amazon MSK deploys native versions of Apache Kafka so applications and tools built for Apache Kafka are designed to work with Amazon MSK with no application code changes.

Version availability - Amazon MSK typically makes newer versions of Apache Kafka available after public availability.

Version upgrades - You can upgrade Apache Kafka versions on provisioned clusters in a few clicks, allowing you to decide when to take advantage of features of bug fixes present in new Apache Kafka versions. Amazon MSK is designed to deploy version upgrades on running clusters to maintain client I/O availability for customers following best practices. For serverless clusters, Apache Kafka versions are designed to be upgraded automatically by Amazon MSK.

Tiered storage

With tiered storage, you can store virtually unlimited data in MSK without the need to provision and manage storage capacity with tiered storage. You can enable tiered storage with a few clicks for new or existing clusters and pay for what you use. You can first store data in a performance optimized primary storage tier and let MSK automatically tier data into the new low-cost tier for longer retention. The feature is supported in all AWS regions where MSK is present. To learn how to get started tiered storage, visit our Amazon MSK Developer Guide.

Scalable

Broker scaling (provisioned clusters only) - You can scale your Amazon MSK clusters by changing the size or family of your Apache Kafka brokers. Changing the size or family of your brokers is a popular way to scale Amazon MSK clusters because it gives you the flexibility to adjust your MSK cluster’s compute capacity for changes in your workloads. This method can be preferred because it does not require partition reassignment which can impact Apache Kafka availability.

Cluster scaling (serverless clusters only) - Amazon MSK is designed to scale compute and storage resources of your clusters in response to your application’s throughput needs.

Partition management - Amazon MSK integrates with Cruise Control, a popular open source tool for Apache Kafka that manages partition assignment on your behalf.

Storage scaling (provisioned clusters only) - You can scale up the amount of storage provisioned per broker to match changes in storage requirements using the AWS management console or AWS CLI or you can create an auto scaling policy to expand your storage to meet your streaming requirements.

Configurable

Amazon MSK is designed to deploy a best practice cluster configuration for Apache Kafka, and gives customers the ability to tune more than 30 different cluster configurations while supporting all dynamic and topic-level configurations.

Visible

CloudWatch metrics by default - You can visualize and monitor important cluster, broker, topic, consumer, and partition-level metrics using Amazon CloudWatch.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.