What’s the difference between Kafka and Redis?
Redis is an in-memory key-value data store, while Apache Kafka is a stream processing engine. However, you can compare the two technologies because you can use both to create a publish-subscribe (pub/sub) messaging system. In modern cloud architecture, applications are decoupled into smaller, independent building blocks called services. Pub/sub messaging provides instant event notifications for these distributed systems. Kafka supports a pull-based system where publishers and subscribers share a common message queue from which subscribers pull messages as needed. Redis supports a push-based system where the publisher distributes messages to all subscribers when an event occurs.
How they work: Kafka vs. Redis pub/sub
Apache Kafka is an event streaming platform that allows multiple applications to stream data independently of each other. These applications, called producers and consumers, publish and subscribe information to and from certain data partitions called topics.
Meanwhile, Redis is designed as an in-memory database that supports low-latency data transfer between applications. It stores all the messages on RAM instead of a hard disk to reduce data read and write time. Like Kafka, multiple consumers can subscribe to a Redis stream to retrieve messages.
Though you can use both for pub/sub messaging, Kafka and Redis work differently.
Apache Kafka connects producers and consumers through compute clusters. Each cluster consists of several Kafka brokers that reside on different servers.
Kafka creates topics and partitions for these purposes:
- Topics to group similar data belonging to a subject of interest, such as email, payment, users, and purchase
- Partitions across different brokers for data replication and fault tolerance
Producers publish messages to the broker. When the broker receives a message, it categorizes the data into a topic and stores the data in a partition. Consumers connect to the relevant topic and extract data from its partition.
Redis runs with a client-server architecture as a NoSQL database system. Producers and consumers are loosely coupled and don't have to know each other when sending messages.
Redis uses keys and primary-secondary nodes for these purposes:
- Keys to group similar messages. For example, “email” is a key that points to the data store that holds only email messages.
- Primary-secondary nodes for message replication.
When a producer sends a message to a specific node, Redis delivers the message to all the connected subscribers by checking the message key. The consumer must always initiate and maintain an active connection with the Redis server to receive messages. This is known as connected delivery semantics.
Message handling: Kafka vs. Redis pub/sub
Apache Kafka provides developers with highly scalable distributed messaging systems. Meanwhile, Redis offers rich data structures that enable an application to push data to multiple nodes quickly. Both systems have several differences in their message queuing mechanisms.
Kafka and Redis work best when they send small-sized data packets between consumers and subscribers.
Redis, in particular, isn't designed to handle large data sizes without compromising throughput. It also can’t store large amounts of data, as RAM has a smaller capacity than disk storage.
Meanwhile, Kafka can support reasonably large messages despite not being specifically built to do so. Kafka can handle messages up to 1 GB if it compresses the message and you configure it for tiered storage. Instead of storing all messages in the local storage, it uses remote storage to store the completed log files.
Kafka consumers pull the data from the message queue. Each Kafka consumer keeps track of the message it has read with an offset, which it updates to retrieve the subsequent message. Consumers can detect and track duplicate messages.
On the other hand, Redis automatically pushes the message to the connected subscribers. Redis subscribers passively wait for incoming messages directed to them from the server. As it is an at-most-once delivery setup, Redis subscribers are not capable of detecting duplicate messages.
Kafka retains messages after consumers read them. So, if a client application loses the retrieved data, it can request that data again from the partition it subscribes to. By setting the message retention policy, users can determine how long Kafka retains the data.
Conversely, Redis doesn't store messages after they are delivered. If no subscribers are connected to the stream, Redis discards the messages. Discarded messages cannot be recovered even if the subscriber connects to Redis later on.
Both Kafka and Redis allow applications to mitigate unreliable message delivery, but they do so differently.
Error handling in Redis focuses on the interaction between the client application and the Redis services. With Redis, developers can address circumstances such as client timeouts, memory buffer exceeded, and maximum client limits. Because of its key-value pair database architecture, Redis cannot provide robust message error handling as Kafka does.
Kafka developers can store erroneous events in a dead letter queue, retry, or redirect them to allow consistent message delivery to client applications. Developers can also use the Kafka Connect API for automatically restarting connector tasks in certain errors.
Performance differences: Kafka vs. Redis pub/sub
Overall, Apache Kafka outperforms Redis in pub/sub messaging because Kafka was designed specifically for data streaming. Redis has several different use cases where Kafka cannot be used.
Parallelism is the ability of multiple consumers to receive the same message simultaneously.
Redis doesn't support parallelism.
On the other hand, Kafka allows the same message to be distributed to multiple consumers concurrently. Usually, consumers in Kafka consumer groups take turns retrieving new messages from a partition. If there’s only a single consumer in multiple consumer groups, it retrieves all the messages. By taking advantage of this setup and partition replication, you can assign one consumer to each consumer group at each partition replica. This allows all consumers to retrieve a similar sequence of messages.
Throughput measures the number of messages each system can process per second.
Kafka generally has higher throughput than Redis pub/sub. Kafka handles much larger data volumes because it doesn't have to wait for each subscriber to receive the message before moving to another. Instead, it stores current messages on a memory cache and storage, which optimizes read speed.
However, Kafka's performance may decrease if consumers are not retrieving the message fast enough, as unread messages on the cache are eventually removed. In this case, consumers must read from the disk, which is slower.
Meanwhile, Redis must wait for acknowledgment for each consumer, which significantly decreases its throughput with more connected nodes. A workaround is sending multiple requests with a process called pipelining, but this reduces its messaging latency.
Both Kafka and Redis are suitable for low-latency data processing. Redis offers a lower messaging time that ranges in milliseconds, while Kafka averages tens of milliseconds.
Considering that Redis reads and writes data primarily on RAM, it naturally edges Kafka in speed. However, Redis might not maintain ultra-low-latency data operations when it handles larger messages. Meanwhile, Kafka needs more time to replicate partitions on different physical drives for data persistence, which adds overhead to message delivery time.
Optimizing latency for Redis and Kafka is possible, but you must do so carefully. For example, you can compress Kafka messages to decrease latency, but producers and consumers need more time to decompress them.
Latency in Redis might be caused by several factors, including the operating environment, network operations, slow commands, or forking. To reduce forking delays, Redis recommends running the pub/sub delivery system on modern EC2 instances based on a Hardware Virtual Machine (HVM).
Kafka writes all data on a leading broker's storage disk and replicates it across different servers. When a server fails, multiple subscribers retrieve the data from the backup partitions.
Unlike Kafka, Redis does not back up data by default, and users must enable the feature manually. Redis uses in-memory data store, which loses all data when powered down. To avert that, developers turn on Redis Database (RDB) persistence to periodically capture snapshots of the RAM data and store it on disk.
When to use: Kafka vs. Redis pub/sub
Apache Kafka is the better choice for building applications that stream large datasets and require high recoverability. It was initially developed as a single distributed data pipeline capable of handling trillions of messages that pass through. Kafka replicates partitions across different servers to prevent data loss when a node fails. Organizations use Kafka to support real-time communication between applications, mobile Internet of Things (IoT) devices, and microservices. It’s also the better choice for log aggregation, stream processing, and other cloud-based data integration tasks.
Meanwhile, Redis provides ultra-low-latency event distribution for applications that require instantaneous data transfer but tolerate small data loss. Redis is commonly used as a session cache to store frequently accessed data or deliver urgent messages. It’s also suitable for storing gaming, ecommerce, or social media data to allow for a smoother user experience.
Summary of differences: Kafka vs. Redis pub/sub
Supports message size up to 1 GB with compression and tiered storage.
Supports smaller message size.
Subscribers pull messages from queue.
Redis server pushes messages to connected subscribers.
Retains messages after retrieval.
Does not retain messages.
Robust error handling at messaging level. Dead letter queue, event retry, and redirection.
You must handle Redis exceptions at application level with timeouts, client limits, and memory buffer capacity.
Kafka supports parallelism. Multiple consumers can retrieve the same message concurrently.
Does not support parallelism.
Has higher throughput because of asynchronous read/write.
Lower throughput because Redis server needs to wait for a reply before sending message to another subscriber.
Low latency. Slightly slower than Redis because of data replication by default.
Ultra-low latency when distributing smaller-sized messages.
Automatically backs up partitions to different brokers.
Doesn’t back up by default. Users can enable Redis persistence manually. Risk of small data loss.
How can AWS support your Kafka and Redis requirements?
Amazon Web Services (AWS) provides scalable and managed infrastructure to support your publish-subscribe (pub/sub) messaging needs.
Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to easily ingest and process large volumes of data in real time. You can build a privately accessed data bus to provide high-availability streaming nodes at scale. You can also seamlessly connect with other AWS services like AWS IoT Core, Amazon Virtual Private Cloud (Amazon VPC), and Amazon Managed Service for Apache Flink.
Use Amazon MemoryDB for Redis to provide high-availability in-memory storage for your Redis workloads. You can run high-concurrency streaming data feeds to ingest user activity. And you can support millions of requests per day for media and entertainment applications.
Instead of Redis or Kafka, you can also use Amazon Simple Notification Service (Amazon SNS) to build a pub/sub messaging system. You can directly send messages from your applications to customers or other applications in a scalable, cost-efficient way. Amazon SNS offers several features, such as:
- High-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications.
- Message encryption and traffic privacy.
- Fanout capabilities across AWS categories. This includes analytics, compute, containers, databases, Internet of Things (IoT), machine learning (ML), security, and storage.
Get started with pub/sub, Redis, and Kafka on AWS by creating an account today.