Impetus’ Amazon MSK Solution for Scalable Data Processing

By: Hari Ramesh, Sr Partner Solution Architect – AWS
By:Manoj Kumar, Senior Engineering Manager – Impetus Technologies (India) Pvt Ltd

Impetus

Impetus‘ client, a renowned B2B digital sales company, used a customer relationship management (CRM) system to process online booking data. At the core of the CRM system and other downstream applications was a self-managed Apache Kafka cluster acting as a message broker. However, the client faced several challenges:

The CRM application, which typically generated around 500 MB of data on weekdays, was prone to spikes up to 5 GB on weekends due to source system reconciliation. This placed excessive pressure on resources. The data surges resulted in frequent processing outages, message loss, and delays in the daily customer reports, negatively impacting the client’s businesses. To mitigate the scalability issues, the client had to allocate two DevOps engineers to work on weekends.
The delays in the near-real-time data processing pipeline also affected service-level agreements (SLAs), such as the timely processing of new customer requests.

To address the scalability challenges and reduce operational overhead, Impetus Technologies advised the client to migrate the CRM application to Amazon Managed Streaming for Apache Kafka (Amazon MSK), thereby streamlining their processes and enhancing overall efficiency.

In this blog post, we will guide you how to re-platform on-premise Kafka to Amazon MSK to reduce operational overhead and design a high available Apache Kafka and Kafka Connect clusters.

Solution Overview

A successful Kafka migration to Amazon MSK provides the following key benefits:

Highly available and secure – Amazon MSK creates an Apache Kafka cluster and offers multi-AZ replication within an AWS Region. Amazon MSK continuously monitors cluster health and automatically replaces any failed components. Amazon MSK provides multiple levels of security for Apache Kafka clusters including Amazon VPC network isolation, AWS Identity and access management(AWS IAM) for control-plane API authorization, encryption at rest, TLS encryption in-transit, TLS based certificate authentication, and supports Apache Kafka Access Control Lists (ACLs) for data-plane authorization.
Fully Managed – Amazon MSK lets customers focus on creating their streaming applications without having to worry about the operational overhead of managing Apache Kafka environment. Amazon MSK manages the provisioning, configuration, and maintenance of Apache Kafka clusters and Apache ZooKeeper nodes. Amazon MSK also shows key Apache Kafka performance metrics in the AWS console.

To achieve a seamless migration, Impetus leveraged MirrorMaker 2.0 (MM2) to migrate Kafka clusters to Amazon MSK. This involved configuring MM2 on an Amazon Elastic Compute Cloud (Amazon EC2) instance to replicate Kafka topics within the target MSK environment. The following diagram provides a visual representation of the migration strategy and process to transition to Amazon MSK.

Figure 1: MM2 replication flow when the Apache Kafka Cluster migrated to Amazon MSK

The migration to Amazon MSK involved the following key steps:

MirrorMaker 2 (MM2) utilize a Apache Kafka consumer to consume messages from the source Kafka cluster and re-publish them to the target Amazon MSK cluster, so ensuring uninterrupted service and eliminating any downtime.
Redirect consumers to subscribe to Amazon MSK topics instead of on-premise Kafka topics, enabling a seamless transition. Re-pointed producers to publish to Amazon MSK topics instead of on-premise Kafka topics, completing the migration process.

Solution

By transitioning from open-source Apache Kafka brokers to Amazon MSK, the client was able to scale significantly up their storage capacity from 500 GB to 16 tebibytes (TiB) per broker, thereby unlocking growth potential for a greater number of producer and consumer applications. Furthermore, Amazon MSK enabled higher degree of high-availability by offering multi-AZ replication within AWS Region, eliminating the risk of outages and downtime for CRM application.

The following step-by-step guide outlines the migration process from Kafka to Amazon MSK, which involved multiple topics – including Order, Price, and Order-Bad (Bad customer orders while booking), among others – where various systems were pushing data and consumer applications were consuming data. This blog provides a clear roadmap for replicating this successful migration, enabling organization to harness the full benefits of Amazon MSK.

Step 1: Estimate Required (Amazon MSK Environment Size)

Getting current cluster state and workloads is critical

Ingestion rate: the rate at which data is generated and pushed to the Kafka queue, by:
- - - - Considering broker-level metric
        kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
      - Aggregating the value across all brokers present in the cluster.
Consumption rate: the rate at which data is consumed from the Kafka queue by:
- - - - Considering broker level metric
        kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
      - Aggregating the value across all brokers present in the cluster
Data replication strategy: Use the property default.replication.factor and consider the highest value of the replication factor configured default.replication.factor=3
Data retention strategy and disk utilization percentage on target cluster: Use the property log.retention.ms to consider the global highest data retention value and retention.ms to get the data retention value at the topic level. log.retention.ms parameter (default to 1 week). If set to -1, no time limit is applied. log.retention.ms=86400000 # 1 Week

Step 2: Set Up Amazon MSK Cluster

Based on the estimate developed in Step 1, Impetus set up an Amazon MSK cluster, enabling encryption at rest and in transit. Also, AWS Identity and Access Management (IAM) based access control and have applied Authentication with SASL to maintain the security.

Step 3: Migrate Kafka Topics to Amazon MSK

Kafka’s mirroring feature enables the replication of an existing Apache Kafka cluster, facilitating the maintenance of a duplicate copy. The MirrorMaker2 (MM2) tool plays a crucial role in this process, mirroring source Kafka cluster topics (including internal topics and application-specific ones, such as Order, Price, and Order-Bad) into a target Amazon MSK cluster. MM2 achieves this by utilizing consumers to consume messages from the source cluster and re-publishing them to the target MSK cluster via an embedded Kafka producer.

MM2 boasts a range of connectors that facilitate complex data flows between multiple Apache Kafka clusters and across data centers, leveraging existing Kafka Connect clusters.
This enables organizations to manage intricate data pipelines with ease, ensuring seamless data replication and integration.

MirrorSourceConnector, replicates topics from the source cluster into the target cluster.
MirrorCheckpointConnector, emits internal topic checkpoints.internal containing offsets for each consumer group in the source cluster, stores cluster-to-cluster offset mappings in internal topic offset-syncs.internal.
MirrorHeartbeatConnector, emits heartbeats to confirm connectivity through the connectors.

Figure 2: Mirror maker kafka migration Internals

The topics migration involves following steps:

Setting up MM2 cluster on EC2 instance:

a. Configured MM2 on EC2 instance.
b. Configured source Kafka and target Amazon MSK cluster details in MM2 configuration properties file. Below is sample properties file:

#Apache Kafka clusters;clusters = source, target #comma separated list of Apache Kafka cluster aliases;source.bootstrap.servers = src.kafka.host.1:9092, src.kafka.host.2:9092,;src.kafka.host.3:9092 # source kafka server name:9092;target.bootstrap.servers = tgt.kafka.host.1:9092, tgt.kafka.host.2:9092,;tgt.kafka.host.3:9092 # target kafka server name:9092;# Enable and configure individual replication flows;source -&gt; target.enabled = true;# Internal Topic settings;# Replication factor configuration for Source and target cluster;source.config.storage.replication.factor = 3;target.config.storage.replication.factor = 3;source.offset.storage.replication.factor = 1;target.offset.storage.replication.factor = 1;# Mirror Maker configuration;offset-sync.topic.replication.factor = 1;heartbeat.topic.replication.factor = 1;checkpoint.topic.replication.factor = 1;# regex which defines which topics gets replicated. E.g. order-*;topics = price, order-*;groups = .*;tasks.max = 1;replication.factor = 1;refresh.topics.enabled = true;sync.topic.configs.enabled = true;# Enable heartbeats and checkpoints;source-&gt;target.emit.heartbeats.enabled = true;source-&gt;target.emit.checkpoints.enabled = true;

Replicating Kafka topics on the target environment:

Once MM2 setup is completed, we extracted the existing topics from the source Kafka cluster into create_topic_list file. A sample command used is given below:bin/kafka-topics –describe --zookeeper <ZooKeeper_Url> | grep ReplicationFactor | awk ‘{print “<Kafka home dir>/kafka-topics –create –bootstrap-server <Broker_Url> –topic ” $1 ” –replication-factor ” $2 ” –partitions ” $3 ” –config ” $4 ”}’ >> ./create_topic_list

broker_Url is the target cluster broker URL and create_topic_list will have all the topic creation commands ready for the target cluster. The configuration parameters used are given below:

$1: List of topics name we wanted to extract

$2: Replication factor in the target Amazon MSK cluster

$3: # of partitions we wanted to create in target Amazon MSK cluster

$4: Configuration details of the target Amazon MSK cluster

Example:

bin/kafka-topics –describe –zookeeper zkp-1.int.dev:2181, zkp-2.int.dev:2181, zkp-3.int.dev:2181/env/kafka | grep ReplicationFactor | awk ‘{print”<Kafka home directory>/bin/kafka-topics –create –bootstrap-server src.kafka.host.1:9092 –topic ” $1 ” –replication-factor ” $2 ” –partitions ” $3 ” –config ” $4}’ >> ./create_topic_list

Based on the above command, a sample output data is given below:

<Kafka home directory>/bin/kafka-topics –create –zookeeper z-1.kafka.host1:2181,z-3.kafka.host2:2181,z-2.kafka.host3:2181 –topic order –replication-factor 3 –partitions 1 –config min.insync.replicas=1 follower.replication.throttled.replicas=*

<Kafka home directory>/bin/kafka-topics –create –zookeeper z-1.kafka.host1:2181,z-3.kafka.host2:2181,z-2.kafka.host3:2181 –topic price –replication-factor 2 –partitions 1 –config min.insync.replicas=1 follower.replication.throttled.replicas=*

<Kafka home directory>/bin/kafka-topics –create –zookeeper z-1.kafka.host1:2181,z-3.kafka.host2:2181,z-2.kafka.host3:2181 –topic order-bad –replication-factor 3 –partitions 1 –config min.insync.replicas=1

Creating topics in Amazon MSK:create_topic_list file has commands to create topics (order, price and order-bad) in Amazon MSK. Execute the commands on target MSK environment.

Step 4: Validate Target Amazon MSK Cluster

Once topics are created, validate topics on target MSK cluster using the following command: bin/kafka-console-producer –topic order–broker-list <broker-host:port>, Generated standard messages pushed into specified topics order.Make sure messages are in sequence using consumer console commands kafka-console-consumer –topic order –bootstrap-server tgt.kafka.host.1:9092

Step 5: Repointing Consumers to Target Amazon MSK Cluster

Consumer services are stopped and repointed with bootstrap server changes in configuration files.

Step 6: Mirror Kafka Connect and Schema Registry

Next step is to migrate all Kafka connect connectors and schema registries. Kafka Connect is deployed to facilitate the seamless streaming of data between Apache Kafka and other data systems, enabling efficient data integration and processing. Kafka Connect converters provide a mechanism for converting data from the internal data types utilized by Kafka Connect to data types represented as different schemas (including Avro, Protobuf, and JSON). The AvroConverter, ProtobufConverter, and JsonSchemaConverter automatically register schemas generated by source connectors with the Schema Registry, ensuring a streamlined and automated process. Impetus utilised the following sample consumer configuration, mirrormaker.consumer.properties, to effect the mirroring of the Schema Registry and Kafka Connect, thereby ensuring a robust and reliable data replication process.

Sample mirromaker.consumer.properties:

bootstrap.servers= src.kafka.host.1:9092, src.kafka.host.2:9092,

src.kafka.host.3:9092 # Kafka host names

exclude.internal.topics=true

client.id=consumer-group

group.id=consumer-group

auto.offset.reset=earliest

Sample mirromaker.producer.properties:

bootstrap.servers= tgt.kafka.host.1:9092, tgt.kafka.host.2:9092,

tgt.kafka.host.3:9092 # Kafka host names

acks=all

max.request.size=400000

batch.size=50

max.in.flight.requests.per.connection=1

Run below command to transfer producer’s data to target environment.

bin/kafka-mirror-maker –consumer.config mirromaker.consumer.properties –num.streams 1 –producer.config mirromaker.producer.properties –whitelist=”*.support.metrics,__consumer_offsets,_schemas, price,order-*.*”

Here *.support.metrics,__consumer_offsets,_schemas are internal kafka topics and , price,order-*.* are application related topics.

Step 7: Relaunch MSK Services with Updated Configuration

Launch new Kafka Connect services, schema registry, and rest services for target Amazon MSK cluster after updating the configuration. Update ZooKeeper and Broker URLs in connect-distributed.properties, schema-registry.properties, kafka-rest.properties files from Amazon MSK.

- - - To change Kafka Connect service configuration, follow these steps: Edit connect-distributed.properties to change below properties. Example: bin/connect-distributed /opt/kafka/config/connect/connect-distributed.properties
    - To change Kafka schema registry configuration, follow these steps: Edit schema-registry. properties to change below properties kafkastore.connection.url=<ZOO-KEEPER-PATH USED FOR AWS MSK KAFKA CLUSTER>. Example:bin/schema-registry-start /opt/kafka/config/registry/schema-registry.properties
    - To change the Kafka Rest Service configuration, follow these steps: Edit kafka-rest.properties to change the below properties bootstrap.servers=<BROKER-URLS for AWS MSK> Example: bin/kafka-rest-start /opt/kafka/config/rest/kafka-rest.properties
    - Edit below properties in environment-specific default file:
      #Apache Kafka clustersclusters = source, target #comma separated list of Apache Kafka cluster aliasessource.bootstrap.servers = src.kafka.host.1:9092, src.kafka.host.2:9092,src.kafka.host.3:9092 # source kafka server name:9092target.bootstrap.servers = tgt.kafka.host.1:9092, tgt.kafka.host.2:9092,tgt.kafka.host.3:9092 # target kafka server name:9092# Enable and configure individual replication flowssource -> target.enabled = true# Internal Topic settings# Replication factor configuration for Source and target clustersource.config.storage.replication.factor = 3target.config.storage.replication.factor = 3source.offset.storage.replication.factor = 1target.offset.storage.replication.factor = 1# Mirror Maker configurationoffset-sync.topic.replication.factor = 1heartbeat.topic.replication.factor = 1checkpoint.topic.replication.factor = 1# regex which defines which topics gets replicated. E.g. order-*topics = price, order-*groups = .*tasks.max = 1replication.factor = 1refresh.topics.enabled = truesync.topic.configs.enabled = true# Enable heartbeats and checkpointssource->target.emit.heartbeats.enabled = truesource->target.emit.checkpoints.enabled = true

Step 8: Re-point Producers to Target MSK Cluster

Producer and Consumer applications are repointed to Amazon MSK

Figure 3: Producer and Consumer applications are re-pointed to Amazon MSK

Stop and re-point all producer services to new configuration. Keep MM2 running and make sure producer ingestion rates are caught up.

Final State Architecture

Figure 4: Final Application Architecture

Figure 4: Final State Architecture after the migration

Architecture summary:

Producer services are deployed at different Amazon Elastic Kubernetes Service(Amazon EKS) nodes/pods and AWS Lambda. These services are producing data and writing on Amazon MSK topics.
With help of Kafka Connect S3 connector with schema registry, data is transformed and stored in Amazon Simple Storage Service (Amazon S3) bucket. We have Apache Hive tables created on top of Amazon S3 buckets.
With help of Kafka Java Database Connectivity (Kafka JDBC) connector with schema registry, data is transformed and stored into a MySQL database.
Analytical user submits Apache Spark jobs to fetch data from Hive to perform different analytics.

Conclusion

Impetus’ successfully migrated five of the client’s Kafka environments to Amazon MSK in less than two months. This migration allowed the client to achieve several key benefits. First, the client was able to enhance their system stability. Second, the client achieved a 40% reduction in operational overhead by optimizing their resource allocation and minimizing manual processes.

The migration to Amazon MSK ensured high data availability, improved operational management and monitoring capabilities, and a substantial reduction in server and operational costs.

Furthermore, this migration enabled the client to better manage and control data surges, which had previously caused frequent processing outages, message loss, and significant business impact. To discover more about this migration and the comprehensive range of Amazon MSK services offered by Impetus, contact Impetus team.

.

.

Impetus Technologies – AWS Partner Spotlight

Impetus is an AWS Advanced Technology Partner and AWS Competency Partner that provides enterpise services and solutions on cloud. Impetus Technologies solves the data, AI, and cloud puzzle by combining unmatched expertise in cloud and data engineering.

Contact Impetus | Partner Overview | AWS Marketplace

AWS Partner Network (APN) Blog