AWS Big Data Blog
Govern how your clients interact with Apache Kafka using API Gateway
At some point, you may ask yourself:
- How can I implement IAM authentication or authorization to Amazon Managed Streaming for Apache Kafka (MSK)?
- How can I protect my Apache Kafka cluster from traffic spikes based on specific scenarios without setting quotas on the cluster?
- How can I validate requests adhere to a JSON Schema?
- How can I make sure parameters are included in the URI, query string, and headers?
- How can Amazon MSK ingest messages lightweight clients without using an agent or the native Apache Kafka protocol?
These tasks are achievable using custom proxy servers or gateways, but these options can be difficult to implement and manage. On the other hand, API Gateway has these features and is a fully managed AWS service.
In this blog post, we will show you how Amazon API Gateway can answer these questions as a component between your Amazon MSK cluster and your clients.
Amazon MSK is a fully managed service for Apache Kafka that makes it easy to provision Kafka clusters with just a few clicks without the need to provision servers, manage storage, or configure Apache Zookeeper manually. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications.
Some use cases include ingesting messages from lightweight IoT devices that don’t have support for native Kafka protocol and orchestrating your streaming services with other backend services including third-party APIs.
This pattern also comes with the following trade-offs:
- Cost and complexity due to another service to run and maintain.
- Performance overhead because it adds extra processing to construct and make HTTP requests. Additionally, REST Proxy needs to parse requests, transform data between formats both for produce, and consume requests.
When you implement this architecture in a production environment, you should consider these points with your business use case and SLA needs.
Solution overview
To implement the solution, complete the following steps:
- Create an MSK cluster, Kafka client, and Kafka REST Proxy
- Create a Kafka topic and configure the REST Proxy on a Kafka client machine
- Create an API with REST Proxy integration via API Gateway
- Test the end-to-end processes by producing and consuming messages to Amazon MSK
The following diagram illustrates the solution architecture.
Within this architecture, you create an MSK cluster and set up an Amazon EC2 instance with the REST Proxy and Kafka client. You then expose the REST Proxy through Amazon API Gateway and also test the solution by producing messages to Amazon MSK using Postman.
For the production implementation, make sure to set up the REST Proxy behind load balancer with an Auto Scaling group.
Prerequisites
Before you get started, you must have the following prerequisites:
- An AWS account that provides access to AWS services
- An IAM user with an access key and secret access key to configure the AWS CLI
- An Amazon EC2 keypair
Creating an MSK cluster, Kafka client, and REST Proxy
AWS CloudFormation provisions all the required resources, including VPC, subnets, security groups, Amazon MSK cluster, Kafka client, and Kafka REST Proxy. To create these resources, complete the following steps:
- Launch in the
us-east-1
orus-west-2
It takes approximately 15 to 20 minutes to complete. - From the AWS CloudFormation console, choose AmzonMSKAPIBlog.
- Under Outputs, get the MSKClusterARN, KafkaClientEC2InstancePublicDNS, and MSKSecurityGroupID details.
- Get the
ZooKeeperConnectionString
and other information about your cluster by entering the following code (provide your Region, cluster ARN, and AWS named profile):The following code example shows one of the lines in the output of this command:
- Get the
BootstrapBrokerString
by entering the following code (provide your Region, cluster ARN, and AWS named profile):
The following code example shows the output of this command:
Creating a Kafka topic and configuring a Kafka REST Proxy
To create a Kafka topic and configure a Kafka REST Proxy on a Kafka client machine, complete the following steps:
- SSH into your Kafka client Amazon EC2 instance. See the following code:
- Go to the bin folder (
kafka/kafka_2.12-2.2.1/bin/
) of the Apache Kafka installation on the client machine. - Create a topic by entering the following code (provide the value you obtained for
ZookeeperConnectString
in the previous step):If the command is successful, you see the following message:
Created topic amazonmskapigwblog
. - To connect the Kafka REST server to the Amazon MSK cluster, modify
kafka-rest.properties
in the directory (/home/ec2-user/confluent-5.3.1/etc/kafka-rest/
) to point to your Amazon MSK’sZookeeperConnectString
andBootstrapserversConnectString
information. See the following code:As an additional, optional step, you can create an SSL for securing communication between REST clients and the REST Proxy (HTTPS). If SSL is not required, you can skip steps 5 and 6.
- Generate the server and client certificates. For more information, see Creating SLL Keys and Certificates on the Confluent website.
- Add the necessary property configurations to the
kafka-rest.properties
configuration file. See the following code example:For more detailed instructions, see Encryption and Authentication with SSL on the Confluent website.
You have now created a Kafka topic and configured Kafka REST Proxy to connect to your Amazon MSK cluster.
Creating an API with Kafka REST Proxy integration
To create an API with Kafka REST Proxy integration via API Gateway, complete the following steps:
- On the API Gateway console, choose Create API.
- For API type, choose REST API.
- Choose Build.
- Choose New API.
- For API Name, enter a name (for example,
amazonmsk-restapi
). - As an optional step, for Description, enter a brief description.
- Choose Create API.The next step is to create a child resource.
- Under Resources, choose a parent resource item.
- Under Actions, choose Create Resource.The New Child Resource pane opens.
- Select Configure as proxy resource.
- For Resource Name, enter
proxy
. - For Resource Path, enter
/{proxy+}
. - Select Enable API Gateway CORS.
- Choose Create Resource.After you create the resource, the Create Method window opens.
- For Integration type, select HTTP Proxy.
- For Endpoint URL, enter an HTTP backend resource URL (your Kafka Clien Amazont EC2 instance
PublicDNS
; for example,http://KafkaClientEC2InstancePublicDNS:8082/{proxy}
orhttps://KafkaClientEC2InstancePublicDNS:8085/{proxy}
). - Use the default settings for the remaining fields.
- Choose Save.
- For SSL, for Endpoint URL, use the HTTPS endpoint.In the API you just created, the API’s proxy resource path of
{proxy+}
becomes the placeholder of any of the backend endpoints underhttp://YourKafkaClientPublicIP:8082/
. - Choose the API you just created.
- Under Actions, choose Deploy API.
- For Deployment stage, choose New Stage.
- For Stage name, enter the stage name (for example,
dev
,test
, orprod
). - Choose Deploy.
- Record the Invoke URL after you have deployed the API.
Your external Kafka REST Proxy, which was exposed through API Gateway, now looks like https://YourAPIGWInvoleURL/dev/topics/amazonmskapigwblog
. You use this URL in the next step.
Testing the end-to-end processes
To test the end-to-end processes by producing and consuming messages to Amazon MSK. Complete the following steps:
- SSH into the Kafka Client Amazon EC2 instance. See the following code:
- Go to the
confluent-5.3.1/bin
directory and start thekafka-rest
service. See the following code:If the service already started, you can stop it with the following code:
- Open another terminal window.
- In the
kafka/kafka_2.12-2.2.1/bin
directory, start the Kafka console consumer. See the following code:You can now produce messages using Postman. Postman is an HTTP client for testing web services.
Be sure to open TCP ports on the Kafka client security group from the system you are running Postman.
- Under Headers, choose the key
Content-Type
with valueapplication/vnd.kafka.json.v2+json
. - Under Body, select raw.
- Choose JSON.This post enters the following code:
The following screen shot shows messages coming to the Kafka consumer from the API Gateway Kafka REST endpoint.
Conclusion
This post demonstrated how easy it is to set up REST API endpoints for Amazon MSK with API Gateway. This solution can help you produce and consume messages to Amazon MSK from any IoT device or programming language without depending on native Kafka protocol or clients.
If you have questions or suggestions, please leave your thoughts in the comments.
About the Author
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Francisco Oliveira is a senior big data solutions architect with AWS. He focuses on building big data solutions with open source technology and AWS. In his free time, he likes to try new sports, travel and explore national parks.