AWS Big Data Blog

Govern how your clients interact with Apache Kafka using API Gateway

At some point, you may ask yourself:

  • How can I implement IAM authentication or authorization to Amazon Managed Streaming for Apache Kafka (MSK)?
  • How can I protect my Apache Kafka cluster from traffic spikes based on specific scenarios without setting quotas on the cluster?
  • How can I validate requests adhere to a JSON Schema?
  • How can I make sure parameters are included in the URI, query string, and headers?
  • How can Amazon MSK ingest messages lightweight clients without using an agent or the native Apache Kafka protocol?

These tasks are achievable using custom proxy servers or gateways, but these options can be difficult to implement and manage. On the other hand, API Gateway has these features and is a fully managed AWS service.

In this blog post, we will show you how Amazon API Gateway can answer these questions as a component between your Amazon MSK cluster and your clients.

Amazon MSK is a fully managed service for Apache Kafka that makes it easy to provision Kafka clusters with just a few clicks without the need to provision servers, manage storage, or configure Apache Zookeeper manually. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications.

Some use cases include ingesting messages from lightweight IoT devices that don’t have support for native Kafka protocol and orchestrating your streaming services with other backend services including third-party APIs.

This pattern also comes with the following trade-offs:

  • Cost and complexity due to another service to run and maintain.
  • Performance overhead because it adds extra processing to construct and make HTTP requests. Additionally, REST Proxy needs to parse requests, transform data between formats both for produce, and consume requests.

When you implement this architecture in a production environment, you should consider these points with your business use case and SLA needs.

Solution overview

To implement the solution, complete the following steps:

  1. Create an MSK cluster, Kafka client, and Kafka REST Proxy
  2. Create a Kafka topic and configure the REST Proxy on a Kafka client machine
  3. Create an API with REST Proxy integration via API Gateway
  4. Test the end-to-end processes by producing and consuming messages to Amazon MSK

The following diagram illustrates the solution architecture.

 

Within this architecture, you create an MSK cluster and set up an Amazon EC2 instance with the REST Proxy and Kafka client. You then expose the REST Proxy through Amazon API Gateway and also test the solution by producing messages to Amazon MSK using Postman.

For the production implementation, make sure to set up the REST Proxy behind load balancer with an Auto Scaling group.

Prerequisites

Before you get started, you must have the following prerequisites:

  • An AWS account that provides access to AWS services
  • An IAM user with an access key and secret access key to configure the AWS CLI
  • An Amazon EC2 keypair

Creating an MSK cluster, Kafka client, and REST Proxy

AWS CloudFormation provisions all the required resources, including VPC, subnets, security groups, Amazon MSK cluster, Kafka client, and Kafka REST Proxy. To create these resources, complete the following steps:

  1. Launch in the us-east-1 or us-west-2It takes approximately 15 to 20 minutes to complete.
  2. From the AWS CloudFormation console, choose AmzonMSKAPIBlog.
  3. Under Outputs, get the MSKClusterARN, KafkaClientEC2InstancePublicDNS, and MSKSecurityGroupID details.
  4. Get the ZooKeeperConnectionString and other information about your cluster by entering the following code (provide your Region, cluster ARN, and AWS named profile):
    $ aws kafka describe-cluster --region <Replace_With_us-east-1_or_us-west-2> --cluster-arn <Replace_With_Your_cluster-arn> --profile <Replace_With_Your_Profile>

    The following code example shows one of the lines in the output of this command:

    {
    ….
    ….
    "ZookeeperConnectString": "z-2.XXXXXX.us-east-1.amazonaws.com:2181,z-3.XXXXXX.us-east-1.amazonaws.com:2181,z-1.XXXXXX.us-east-1.amazonaws.com:2181"
    }
  5. Get the BootstrapBrokerString by entering the following code (provide your Region, cluster ARN, and AWS named profile):

    $ aws kafka get-bootstrap-brokers --region <Replace_With_us-east-1_or_us-west-2> --cluster-arn "<Replace_With_us-east-1_or_us-west-2>" --profile <Replace_With_Your_Profile>

    The following code example shows the output of this command:

    {
    "BootstrapBrokerString": "b-2.XXXXXXXXXXXX.us-east-1.amazonaws.com:9092,b-1.XXXXXXXXXXXX.amazonaws.com:9092,b-3.XXXXXXXXXXXX.us-east-1.amazonaws.com:9092"
    }

Creating a Kafka topic and configuring a Kafka REST Proxy

To create a Kafka topic and configure a Kafka REST Proxy on a Kafka client machine, complete the following steps:

  1. SSH into your Kafka client Amazon EC2 instance. See the following code:
    ssh -i <Replace_With_Your_pemfile> ec2-user@<Replace_With_Your_KafkaClientDNS>
  2. Go to the bin folder (kafka/kafka_2.12-2.2.1/bin/) of the Apache Kafka installation on the client machine.
  3. Create a topic by entering the following code (provide the value you obtained for ZookeeperConnectString in the previous step):
    ./kafka-topics.sh --create --zookeeper <Replace_With_Your_ZookeeperConnectString> --replication-factor 3 --partitions 1 --topic amazonmskapigwblog

    If the command is successful, you see the following message: Created topic amazonmskapigwblog.

  4. To connect the Kafka REST server to the Amazon MSK cluster, modify kafka-rest.properties in the directory (/home/ec2-user/confluent-5.3.1/etc/kafka-rest/) to point to your Amazon MSK’s ZookeeperConnectString and BootstrapserversConnectString information. See the following code:
    sudo vi /home/ec2-user/confluent-5.3.1/etc/kafka-rest/kafka-rest.properties
    
    	zookeeper.connect=<Replace_With_Your_ZookeeperConnectString>
    bootstrap.servers=<Replace_With_Your_BootstrapserversConnectString> 

    As an additional, optional step, you can create an SSL for securing communication between REST clients and the REST Proxy (HTTPS). If SSL is not required, you can skip steps 5 and 6.

  5. Generate the server and client certificates. For more information, see Creating SLL Keys and Certificates on the Confluent website.
  6. Add the necessary property configurations to the kafka-rest.properties configuration file. See the following code example:
    listeners=http://0.0.0.0:8082,https://0.0.0.0:8085
    ssl.truststore.location=<Replace_With_Your_tuststore.jks>
    ssl.truststore.password=<Replace_With_Your_tuststorepassword>
    ssl.keystore.location=<Replace_With_Your_keystore.jks>
    ssl.keystore.password=<Replace_With_Your_keystorepassword>
    ssl.key.password=<Replace_With_Your_sslkeypassword>

    For more detailed instructions, see Encryption and Authentication with SSL on the Confluent website.

You have now created a Kafka topic and configured Kafka REST Proxy to connect to your Amazon MSK cluster.

Creating an API with Kafka REST Proxy integration

To create an API with Kafka REST Proxy integration via API Gateway, complete the following steps:

  1. On the API Gateway console, choose Create API.
  2. For API type, choose REST API.
  3. Choose Build.
  4. Choose New API.
  5. For API Name, enter a name (for example, amazonmsk-restapi).
  6. As an optional step, for Description, enter a brief description.
  7. Choose Create API.The next step is to create a child resource.
  8. Under Resources, choose a parent resource item.
  9. Under Actions, choose Create Resource.The New Child Resource pane opens.
  10. Select Configure as proxy resource.
  11. For Resource Name, enter proxy.
  12. For Resource Path, enter /{proxy+}.
  13. Select Enable API Gateway CORS.
  14. Choose Create Resource.After you create the resource, the Create Method window opens.
  15. For Integration type, select HTTP Proxy.
  16. For Endpoint URL, enter an HTTP backend resource URL (your Kafka Clien Amazont EC2 instance PublicDNS; for example, http://KafkaClientEC2InstancePublicDNS:8082/{proxy} or https://KafkaClientEC2InstancePublicDNS:8085/{proxy}).
  17. Use the default settings for the remaining fields.
  18. Choose Save.
  19. For SSL, for Endpoint URL, use the HTTPS endpoint.In the API you just created, the API’s proxy resource path of {proxy+} becomes the placeholder of any of the backend endpoints under http://YourKafkaClientPublicIP:8082/.
  20. Choose the API you just created.
  21. Under Actions, choose Deploy API.
  22. For Deployment stage, choose New Stage.
  23. For Stage name, enter the stage name (for example, dev, test, or prod).
  24. Choose Deploy.
  25. Record the Invoke URL after you have deployed the API.

Your external Kafka REST Proxy, which was exposed through API Gateway, now looks like https://YourAPIGWInvoleURL/dev/topics/amazonmskapigwblog. You use this URL in the next step.

Testing the end-to-end processes

To test the end-to-end processes by producing and consuming messages to Amazon MSK. Complete the following steps:

  1. SSH into the Kafka Client Amazon EC2 instance. See the following code:
    ssh -i "xxxxx.pem" ec2-user@KafkaClientEC2InstancePublicDNS
  2. Go to the confluent-5.3.1/bin directory and start the kafka-rest service. See the following code:
    ./kafka-rest-start /home/ec2-user/confluent-5.3.1/etc/kafka-rest/kafka-rest.properties

    If the service already started, you can stop it with the following code:

    ./kafka-rest-stop /home/ec2-user/confluent-5.3.1/etc/kafka-rest/kafka-rest.properties
  3. Open another terminal window.
  4. In the kafka/kafka_2.12-2.2.1/bin directory, start the Kafka console consumer. See the following code:
    ./kafka-console-consumer.sh --bootstrap-server "BootstrapserversConnectString" --topic amazonmskapigwblog --from-beginning 

    You can now produce messages using Postman. Postman is an HTTP client for testing web services.

    Be sure to open TCP ports on the Kafka client security group from the system you are running Postman.

  5. Under Headers, choose the key Content-Type with value application/vnd.kafka.json.v2+json.
  6. Under Body, select raw.
  7. Choose JSON.This post enters the following code:
    {"records":[{"value":{"deviceid": "AppleWatch4","heartrate": "72","timestamp":"2019-10-07 12:46:13"}}]} 

    The following screen shot shows messages coming to the Kafka consumer from the API Gateway Kafka REST endpoint.

Conclusion

This post demonstrated how easy it is to set up REST API endpoints for Amazon MSK with API Gateway. This solution can help you produce and consume messages to Amazon MSK from any IoT device or programming language without depending on native Kafka protocol or clients.

If you have questions or suggestions, please leave your thoughts in the comments.

 


About the Author

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

Francisco Oliveira is a senior big data solutions architect with AWS. He focuses on building big data solutions with open source technology and AWS. In his free time, he likes to try new sports, travel and explore national parks.