General

Q: What is Amazon MSK?
Amazon Managed Streaming for Kafka (Amazon MSK) is a new AWS streaming data service that manages Apache Kafka infrastructure and operations, making it easy for developers and DevOps managers to run Apache Kafka applications on AWS without the need to become experts in operating Apache Kafka clusters. Amazon MSK is an ideal place to run existing or new Apache Kafka applications in AWS. Amazon MSK operates and maintains Apache Kafka clusters, provides enterprise-grade security features out of the box, and has built-in AWS integrations that accelerate development of streaming data applications. To get started, you can migrate existing Apache Kafka workloads into Amazon MSK, or with a few clicks, you can build new ones from scratch in minutes. There are no data transfer charges for in-cluster traffic, and no commitments or upfront payments required. You only pay for the resources that you use.
 
Q: What is Apache Kafka?
Apache Kafka is an open-source, high performance, fault-tolerant, and scalable platform for building real-time streaming data pipelines and applications. Apache Kafka is a streaming data store that decouples applications producing streaming data (producers) into its data store from applications consuming streaming data (consumers) from its data store. Organizations use Apache Kafka as a data source for applications that continuously analyze and react to streaming data.
 
Q: What is streaming data?
Streaming data is a continuous stream of small records (a record is typically a few kilobytes) generated by thousands of machines, devices, websites, and applications. Streaming data includes a wide variety of data such as log files generated by customers using your mobile or web applications, ecommerce purchases, in-game player activity, information from social networks, financial trading floors, geospatial services, and telemetry from connected devices or instrumentation in data centers. Streaming data services like Amazon Managed Streaming for Kafka and Amazon Kinesis Data Streams make it easy for you to continuously collect, process, and deliver streaming data.
 
Q: What are Apache Kafka’s primary capabilities?
Apache Kafka has three key capabilities:
  • Apache Kafka stores streaming data in a fault-tolerant way as a continuous series of records and preserves the order in which the records were produced.
  • Apache Kafka acts as a buffer between data producers and data consumers. Apache Kafka allows many data producers (e.g. websites, IoT devices, Amazon EC2 instances) to continuously publish streaming data and categorize this data using Apache Kafka topics. Multiple data consumers (e.g. machine learning applications, Lambda functions) read from these topics at their own rate, similar to a message queue or enterprise messaging system.
  • Data consumers process data from Apache Kafka topics on a first-in-first-out basis, preserving the order data was produced.
 
Q: What are the key concepts of Apache Kafka?
Apache Kafka stores records in topics. Data producers write records to topics and consumers read records from topics. Each record in Apache Kafka consists of a key, a value, and a timestamp. Apache Kafka partitions and replicates topics across multiple nodes called brokers. Apache Kafka runs as a cluster on one or more brokers, and brokers can be located in multiple AWS availability zones to create a highly available cluster. Apache Kafka relies on Apache ZooKeeper to coordinate cluster tasks and can maintain state for resources interacting with an Apache Kafka cluster.
 
Q: When should I use Apache Kafka?
Apache Kafka is used to support real-time applications that transform, deliver, and react to streaming data, and for building real-time streaming data pipelines that reliably get data between multiple systems or applications.
 
Q: What does Amazon Managed Streaming for Kafka do?
Amazon Managed Streaming for Kafka (Amazon MSK) makes it easy to get started and run open-source versions of Apache Kafka in AWS with high availability and security while providing integration with AWS services without the operational overhead of running an Apache Kafka cluster. Amazon MSK allows you to use and configure open-source versions of Apache Kafka while the service manages the setup, provisioning, AWS integrations, and on-going maintenance of Apache Kafka clusters.
 
With a few clicks in the console, you can provision an Amazon MSK cluster. From there, Amazon MSK replaces unhealthy brokers, automatically replicates data for high availability, manages Apache ZooKeeper nodes, automatically deploys hardware patches as needed, manages the integrations with AWS services, makes important metrics visible through the console, and will support Apache Kafka version upgrades when more than one version is supported so you can take advantage of improvements to the open-source version of Apache Kafka.
 
Q: What Apache Kafka versions does Amazon MSK support?
Amazon MSK currently supports Apache Kafka version 1.1.1 and version 2.1.0.
 
Q: Are Apache Kafka APIs compatible with Amazon MSK?
Yes, all data plane and admin APIs are natively supported by Amazon MSK.
 
Q: Is the Apache Kafka AdminClient supported by Amazon MSK?
Yes.

Data production and consumption

Q: Can I use Apache Kafka APIs to get data in and out of Apache Kafka?
Yes, Amazon MSK supports the native Apache Kafka producer and consumer APIs. Your application code does not need to change when clients begin to work with clusters within Amazon MSK.
 
 
Q: Can I use Apache Kafka Connect, Apache Kafka Streams, or any other ecosystem component of Apache Kafka with Amazon MSK?
Yes, you can use any component that leverages the Apache Kafka producer and consumer APIs, and the Apache Kafka AdminClient. Tools that upload .jar files into Apache Kafka clusters are currently not compatible with Amazon MSK, including Confluent Control Center, Confluent Auto Data Balancer, Uber  uReplicator, and LinkedIn  Cruise Control.

Migrating to Amazon MSK

Q: Can I migrate data within my existing Apache Kafka cluster to Amazon MSK?
Yes, you can use third-party tools or open source tools like MirrorMaker that come with open source Apache Kafka to replicate data from clusters into an Amazon MSK cluster.

Version upgrades

Q: How will Amazon MSK allow me to deploy version upgrades - minor or major - to Apache Kafka clusters when the service supports more than one version?
UpdateClusterSoftware is not supported during the Amazon MSK preview period.
 
Q: How will the upgrade process work under the hood?
When you deploy a new version, Amazon MSK uses a rolling process that upgrades one broker or Apache ZooKeeper node at a time before moving on to the next resource. Throughout the upgrade process your cluster will be in an ‘Updating’ state and will transition to an ‘Active’ state when finished. It’s important to note that if you chose to not replicate data to multiple brokers within a cluster that is being upgraded, your cluster will experience downtime.

Clusters

Q: How do I create my first Amazon MSK cluster?
You can create your first cluster with a few clicks in the AWS management console or using the AWS SDKs. First, in the Amazon MSK console, and pick an AWS region to create an Amazon MSK cluster in. Choose a name for your cluster, the VPC you want to run the cluster with, a data replication strategy for the cluster (three AZ is default for high durability), and the subnets for each AZ. Next, pick a broker instance type and quantity of brokers per AZ, and click create.

Q: What resources are within a cluster?
Each cluster contains broker instances, provisioned storage, and Apache ZooKeeper nodes.

Q: What types of broker instances can I provision within an Amazon MSK cluster?
You can choose instances within the EC2 M5 instance family.

Q: Do I need to provision and pay for broker boot volumes?
No, each broker you provision includes boot volume storage managed by the Amazon MSK service.

Q: When I create an Apache Kafka cluster, do the underlying resources (e.g. Amazon EC2 instances) show up in my EC2 console?
Some resources, like elastic network interfaces (ENIs), will show up in your Amazon EC2 account. Other Amazon MSK resources will not show up in your EC2 account as these are managed by the Amazon MSK service.

Q: What do I need to provision within an Amazon MSK cluster?
You need to provision broker instances and broker storage with every cluster you create. You do not provision Apache ZooKeeper nodes as these resources are included at no additional charge with each cluster you create.

Q: What is the default broker configuration for a cluster?
Unless otherwise specified, Amazon MSK uses the same defaults specified by the open-source version of Apache Kafka. The following are the defaults used by the service, and this configuration cannot be changed while the service is in preview.

Broker replication strategy

3-AZ

Min.Insync.Replicas

2

Broker.ID

Set by the service

Default.Replication.Factor

3 for 3-AZ

security.inter.broker.protocol

Plaintext

Server-side encryption

AWS KMS enabled via AWS service key

Q: Can I provision brokers such that they are imbalanced across AZs (e.g. 3 in us-east-1a, 2 in us-east-1b, 1 in us-east-1c)?
No, Amazon MSK enforces the best practice of balancing broker quantities across AZs within a cluster.

Q: How does data replication work in Amazon MSK?
Amazon MSK uses Apache Kafka’s leader-follower replication to replicate data between brokers. Amazon MSK makes it easy to deploy popular replication strategies and gives you the option to use a custom replication strategy. By default with each of the replication options, leader and follower brokers will be deployed and isolated using the replication strategy specified. For example, if you select a 3 AZ broker replication strategy with 1 broker per AZ cluster, Amazon MSK will create a cluster of three brokers (1 broker in three AZs in a region), and by default (unless you choose to override the topic replication factor) the topic replication factor will also be three. The replication strategy you choose also determines the minimum number of Apache ZooKeeper nodes assigned to your cluster behind the scenes.

Topics

Q: How do I create topics?
Once your Apache Kafka cluster has been created, you can create topics using the Apache Kafka APIs. All topic and partition level actions and configurations are performed using Apache Kafka APIs.
 
 
Q: What is the default configuration of a new topic?
Amazon MSK uses Apache Kafka’s default configuration unless otherwise specified here:  

Replication factor

Cluster default

Min.Insync.Replicas

2

Networking

Q: Does Amazon MSK run in an Amazon VPC?
Yes, Amazon MSK always runs within an Amazon VPC managed by the Amazon MSK service. Amazon MSK resources will be available to your own Amazon VPC, subnet, and security group you select when the cluster is setup. IP addresses from your VPC are attached to your Amazon MSK resources through elastic network interfaces (ENIs), and all network traffic stays within the AWS network and is not accessible to the Internet.
 
Q: Is the connection between my clients and an Amazon MSK cluster always private?
Yes, the only way data can be produced and consumed from an Amazon MSK cluster is over a private connection between your clients in your VPC and the Amazon MSK cluster. Amazon MSK does not support public endpoints.

Q: How will the brokers in my Amazon MSK cluster be made accessible to clients within my VPC?
The brokers in your cluster will be made accessible to clients in your VPC through elastic network interfaces (ENIs) which will appear in your account. The Security Groups on the ENIs will dictate the source and type of ingress and egress traffic allowed on your brokers.

Q: How can I give clients running in different AWS accounts access to my cluster?
You can use VPC peering to give clients running in different AWS accounts access to your cluster.

Encryption

Q: Can I encrypt data in my Amazon MSK cluster?
Yes, Amazon MSK uses Amazon EBS server-side encryption and AWS KMS keys to encrypt storage volumes.

Q: Is data encrypted over the wire as it moves between brokers in an Amazon MSK cluster?
No, not at this time.
 
Q: Is data encrypted over the wire as it moves between brokers and Apache ZooKeeper nodes in an Amazon MSK cluster?
No, not at this time.
 
Q: Can I encrypt data over the wire between my Apache Kafka clients and the Amazon MSK service?
No, not at this time.

Monitoring, Metrics, Logging

Q: How do I monitor the performance of my clusters or topics?
You can monitor the performance of your clusters using standard metrics, and you can monitor the performance of your topics using enhanced metrics within the Amazon CloudWatch console.
 
Q: How do I monitor the health and performance of clients?
You can use any client-side monitoring supported by the Apache Kafka version you are using.

Apache ZooKeeper

Q: What is Apache ZooKeeper?
From https://zookeeper.apache.org/: “Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications,” including Apache Kafka.

Q: Does Amazon MSK use Apache ZooKeeper?
Yes, Amazon MSK uses Apache ZooKeeper and manages Apache ZooKeeper within each cluster as a part of the Amazon MSK service. Apache ZooKeeper nodes are included with each cluster at no additional cost.
 
Q: How do my clients interact with Apache ZooKeeper?
Your clients can interact with Apache ZooKeeper through an Apache ZooKeeper endpoint provided by the service. This endpoint is provided in the AWS management console or using the DescribeCluster API.

Integrations

Q: What AWS services does Amazon MSK integrate with?
Amazon MSK integrates with:

Scaling

Q: How can I scale up my cluster?
Scaling of an existing cluster is not supported during the Amazon MSK preview period.

Pricing and Availability

Q: How does Amazon MSK pricing work?
Pricing is based is per Apache Kafka broker-hour, and per provisioned storage-hour. AWS data transfer rates apply for data transfer in and out of Amazon MSK. For more information, visit our pricing page.

Q: Do I pay for data transfer as a result of data replication?
No, all in-cluster data transfer is included with the service at no additional charge.
 
Q: What regions offer Amazon MSK?
Amazon MSK is available in three AWS regions: N. Virginia (us-east-1), Ohio (us-east-2) and Ireland (eu-west-1), while in public preview.

Q: How does data transfer pricing work?
You will pay standard AWS data transfer charges for data transferred in and out of an Amazon MSK cluster. You will not be charged for data transfer within the cluster in a region, including data transfer between brokers and data transfer between brokers and Apache ZooKeeper nodes.

Get started with Amazon MSK

Product-Page_Standard-Icons_01_Product-Features_SqInk
Calculate your costs

Visit the Amazon MSK pricing page.

Product-Page_Standard-Icons_01_Product-Features_SqInk
Review the getting-started guide

Learn how to set up your Apache Kafka cluster on Amazon MSK in this step-by-step guide.

Product-Page_Standard-Icons_03_Start-Building_SqInk
Run your Apache Kafka cluster

Start running your Apache Kafka cluster on Amazon MSK. Log in to the Amazon MSK console.