Amazon Managed Streaming for Apache Kafka (MSK) offers fully managed Apache Kafka. This means Amazon MSK provisions your servers, configures your Apache Kafka clusters, replaces servers when they fail, orchestrates server patches and upgrades, architects clusters for high availability, ensures data is durably stored and secured, sets up monitoring and alarms, and runs scaling to support load changes. With a managed service, you can spend your time developing and running streaming event applications.
Amazon MSK provides open-source, highly secure Apache Kafka clusters distributed across multiple Availability Zones (AZs), giving you resilient, highly available streaming storage. Amazon MSK is highly configurable, observable, and scalable, allowing for the flexibility and control needed for various use cases.
Application development is simpler with Amazon MSK because of tight integrations with other AWS services. Amazon MSK integrates with AWS Identity and Access Management (IAM) and AWS Certificate Manager for security, AWS Glue Schema Registry for schema governance, Amazon Managed Service for Apache Flink and AWS Lambda for stream processing, and more. Amazon MSK provides the integration backbone for modern messaging and event-driven applications at the center of data ingest and processing services, as well as microservice application architectures.
No servers to manage
With a few clicks in the console, you can create a fully managed Apache Kafka cluster that follows Apache Kafka’s deployment best practices, or create your own cluster using a custom configuration. Once you create your desired configuration, Amazon MSK automatically provisions, configures, and manages your Apache Kafka cluster operations and Apache ZooKeeper nodes.
Apache ZooKeeper included
Apache ZooKeeper is required to run Apache Kafka, coordinate cluster tasks, and maintain state for resources interacting with the cluster. Amazon MSK manages the Apache ZooKeeper nodes for you. Each Amazon MSK cluster includes the appropriate number of Apache ZooKeeper nodes for your Apache Kafka cluster at no additional cost.
Amazon MSK Serverless
MSK Serverless is a cluster type for Amazon MSK that makes it easy for you to run Apache Kafka clusters without having to manage compute and storage capacity. MSK Serverless automatically provisions and scales resources while also managing Apache Kafka partitions, so you can stream data without having to worry about right-sizing or scaling clusters.
High availability is default
All clusters are distributed across multiple AZs (three is the default), are supported by Amazon MSK’s service-level agreement, and are supported by automated systems that detect and respond to issues within cluster infrastructure and Apache Kafka software. If a component fails, Amazon MSK automatically replaces it without downtime to your applications. Amazon MSK manages the availability of your Apache ZooKeeper nodes so you don’t need to start, stop, or directly access the nodes yourself. It also automatically deploys software patches as needed to keep your cluster up to date and running smoothly.
Reliable and automated data replication across MSK clusters
MSK Replicator is a feature of Amazon MSK that enables you to reliably replicate data across Amazon MSK clusters in just a few clicks without requiring expertise to setup open-source tools, writing code, or managing infrastructure. MSK Replicator automatically provisions and scales underlying resources, so that you can replicate data on-demand and pay only for what you use. With MSK Replicator, you can build highly available and fault-tolerant multi-region applications for increased resiliency. You can also use MSK Replicator to provide lower latency data access in different geographic regions or to distribute data to your partners.
Your Apache Kafka cluster runs in an Amazon Virtual Private Cloud (VPC) managed by Amazon MSK. Kafka clients in your own Amazon VPC can access the cluster privately through a cross-account elastic network interface that Amazon MSK deploys in your VPC. If your Kafka clients are spread across one or more VPCs or AWS accounts, you can still connect privately to your cluster by using the multi-VPC private connectivity feature. This feature eliminates the operational overhead of self-managing a PrivateLink solution and scales seamlessly as the Amazon MSK cluster scales, enabling you to maintain private connectivity to the cluster without making additional configuration changes. Multi-VPC private connectivity also eliminates the challenges with managing non-overlapping IPs, complex peering and routing tables associated with other VPC connectivity solutions as it allows for overlapping IPs across connecting VPCs.
Cross-account access control
Use a cluster policy for your Amazon MSK cluster to define which IAM principals have cross-account permissions to set up private connectivity to your Amazon MSK cluster. When used with IAM client authentication, you can also use the cluster policy to granularly define Kafka data plane permissions for the connecting clients.
Granular access control
IAM Access Control is a no-cost security option that simplifies cluster authentication and Apache Kafka API authorization using IAM roles or user policies to control access. Using IAM Access Control, you no longer need to build and run one-off access management systems to control client authentication and authorization for Apache Kafka. Your clusters are secured using least-privileged permissions by default. For provisioned clusters, you also can use Simple Authentication and Security Layer (SASL)/Salted Challenge Response Authentication Mechanism (SCRAM) or mutual Transport Layer Security (TLS) authentication with Apache Kafka access control lists (ACLs) to control client access.
Encryption at rest and in transit
Amazon MSK encrypts your data at rest without special configuration or third-party tools. For provisioned clusters, all data at rest can be encrypted using AWS Key Management Service (KMS) key by default or your own key. You can also encrypt data in transit via TLS between brokers and between clients and brokers on your cluster. For serverless clusters, all data at rest is encrypted by default using service-managed keys, and all data in transit is encrypted by default via TLS.
Connectivity over the internet
Amazon MSK offers an option to securely connect to the brokers of Amazon MSK clusters running Apache Kafka 2.6.0 or later versions over the internet. By enabling Public Access, authorized clients external to a private Amazon Virtual Private Cloud (VPC) can stream encrypted data in and out of specific Amazon MSK clusters.
Cross-Account Access Control
Use a cluster policy for your Amazon MSK cluster to define which cross-account IAM principals have permissions to set up cross-account private connectivity to your Amazon MSK cluster. When used with IAM client authentication, you can also use the cluster policy to granularly define Kafka data plane permissions for the connecting clients.
No other provider offers the breadth and depth of AWS integrations in Amazon MSK. These integrations include:
- AWS IAM for Apache Kafka and service-level API access control
- Amazon Managed Service for Apache Flink for running fully managed Apache Flink applications to process streaming data within Apache Kafka
- Amazon Managed Service for Apache Flink Studio to run interactive Streaming SQL and long-running SQL jobs using Apache FlinkSQL
- AWS Glue Schema Registry to centrally control and evolve schemas
- AWS IoT Core for IoT event streaming into MSK
- AWS Database Migration Service (AWS DMS) for change data capture and analytics
- Amazon Virtual Private Cloud (Amazon VPC) for private client connectivity and network isolation
- AWS Key Management Service (AWS KMS) for at-rest encryption
- AWS Certificate Manager Private Certificate Authority for mutual TLS client authentication
- AWS Secrets Manager for secure storage and management of SASL/SCRAM secrets
- AWS CloudFormation to deploy Amazon MSK in code
- Amazon CloudWatch for cluster-, broker-, topic-, consumer-, and partition-level metrics
Run with native Apache Kafka
Amazon MSK deploys native versions of Apache Kafka so applications and tools built for Apache Kafka just work with Amazon MSK out of the box, with no application code changes.
Streamlined version availability
Amazon MSK typically makes newer versions of Apache Kafka available within seven days of public availability.
Seamless version upgrades
You can upgrade Apache Kafka versions on provisioned clusters in just a few clicks, allowing you to decide when to take advantage of features and bug fixes present in new Apache Kafka versions. Amazon MSK automates the deployment of version upgrades on running clusters to maintain client I/O availability for customers following best practices. For serverless clusters, Apache Kafka versions are upgraded automatically by Amazon MSK.
Amazon MSK lets you get started for less than $2.00 per day. Customers typically pay between $0.05 and $0.07 per GB ingested, all-in, which can be as low as 1/13th the cost of other managed providers. Visit the Amazon MSK Pricing page to learn more about pricing.
With tiered storage, you can store virtually unlimited data in MSK without the need to provision and manage storage capacity with tiered storage. You can enable tiered storage with a few clicks for new or existing clusters and pay for what you use. You can first store data in a performance optimized primary storage tier and let MSK automatically tier data into the new low-cost tier for longer retention. The feature is supported in all AWS regions where MSK is present. To learn how to get started tiered storage, visit our Amazon MSK Developer Guide.
Broker scaling (provisioned clusters only)
You can scale your Amazon MSK clusters by changing the size or family of your Apache Kafka brokers in minutes with no downtime. Changing the size or family is a popular way to scale Amazon MSK clusters because it gives you the flexibility to adjust cluster compute capacity for changes in your workloads. This method can be preferred because it doesn’t require partition reassignment, which can impact Apache Kafka availability.
Cluster scaling (serverless clusters only)
Amazon MSK automatically scales compute and storage resources of your clusters in response to your application’s throughput needs.
Automatic partition management
Amazon MSK integrates with Cruise Control, a popular open-source tool for Apache Kafka that automatically manages partition assignment on your behalf. For serverless clusters, Amazon MSK automatically manages partition assignments for you.
Automatic storage scaling (provisioned clusters only)
You can seamlessly scale up the amount of storage provisioned per broker to match storage requirement changes using the AWS Management Console or AWS Command Line Interface (AWS CLI). You can also create an auto scaling policy to automatically expand your storage to meet growing streaming requirements.
Amazon MSK deploys a best practice cluster configuration for Apache Kafka by default. For provisioned clusters, you have the ability to tune more than 30 different cluster configurations while supporting all dynamic and topic-level configurations. For more information, see Custom MSK Configurations in the documentation.
Easy observability of streaming performance with CloudWatch metrics by default
You can visualize and monitor important metrics using Amazon CloudWatch to understand and maintain streaming application performance.