AWS Database Blog

Optimize Redis Client Performance for Amazon ElastiCache and MemoryDB

Redis users typically access a Redis service, such as Amazon ElastiCache or Amazon MemoryDB for Redis, using their choice of language-specific open source client libraries. These libraries are built and maintained by independent teams, with contributions from others including AWS. In this post, we share best practices for optimizing Redis client performance for popular Redis client libraries in Python, Java, C#, Node.js, and PHP.  The benchmarks in this post were done with Amazon ElastiCache, but most of the performance enhancement principles apply to other Redis systems, including Amazon MemoryDB.

We first provide general definitions that are relevant to all client libraries and then we dive into each client library. We are providing comparison between client libraries for same language, to help you make knowledgeable decision on which library suit your needs best. Feel free to jump to different sections on clients you’re interested in. The sections on the different clients in this post aren’t dependent on one another and can be read in any order.

We also share guidelines to help you avoid some common pitfalls and easily avoidable performance issues we have seen customers face. We cover common scenarios and include our testing setup (you can review the code we used for testing on GitHub), as well as our benchmarking results.


In this section, we provide general definitions relevant to all client libraries.

Synchronous vs. asynchronous API

The different APIs for accessing Redis can essentially be divided into two broad categories: synchronous and asynchronous. A single client library may offer both synchronous and an asynchronous API. In fact, most of them do offer both.

A synchronous API is blocking. This means that an application must receive a response to an API call before it can move on to other tasks. When using a synchronous API, the application spends a lot of time waiting for responses.

An asynchronous API is non-blocking. This means that an application is free to move on to other tasks before it receives a response to an API call. With an asynchronous API, instead of idly waiting for a response, the application can run other tasks before the response is received, and handle the response once it arrives.

Connection pool

A connection pool is a store of connections that can be reused when requests to a database are made. You may ask why use a connection pool at all? Why not have every thread or process have its own dedicated connection? There are a couple of reasons. Redis has a limit on the number of open connections it can handle. Using a connection pool reduces the risk of crossing that threshold. For more information, see Best practices: Redis clients and Amazon ElastiCache for Redis.  For short-lived interactions, using a connection pool can boost performance by saving the overhead of establishing a new connection.

Pipelining and batching

The Redis documentation defines pipelining as the ability to send multiple commands to the server without waiting for the replies at all, and finally reading the replies in a single step.
There are two major ways a client library can implement pipelines.

  1. Buffering a sequence of commands in memory and sending them to the Redis server as a single batch. We refer to this method as “batching”.
  2. Using an async API. We refer to this method as “pipelining”.

Transactions (MULTI/EXEC blocks)

A transaction is a sequence of commands that run atomically. Either all commands are run in order, with no other commands run in-between, or no commands are run at all. In Redis, this is achieved with the MULTI command. After receiving the MULTI command, Redis queues all commands it receives. Redis atomically runs the queued commands only after it receives the EXEC command. Because commands are queued on the server side after receiving the MULTI command, the client is free to send the sequence of commands either one by one or in batches.

For more information on Redis transactions, see Transactions.

Test setup and system specs

We use the following specs for our testing:

  • Commands – 80% get, 20% set
  • Size of values in bytes – Drawn from normal distribution with mean = 1024 bytes and 99.99% chance of falling between 4 and 2048 bytes
  • Duration – Approximately 10 minutes per test
  • Size of key space used for get – 3.75 million
  • Size of key space used for set – 3 million
  • Before each test – Flush the database from the previous test, and set every key in the set key space (this is to get an 80% hit rate for every test)
  • Client EC2 instanceAmazon Elastic Compute Cloud (Amazon EC2) instance c5.4xlarge for Amazon Linux 2
  • Amazon ElastiCache specs – We use the following parameters for Amazon ElastiCache:
    • version 6.0
    • Cluster mode disabled
    • TLS disabled
    • 1 shard, 3 nodes (1 primary, 2 replicas)
    • Node type r6g.2xlarge
  • Availability zone: client and server are in the same AZ

Performance evaluation

When measuring performance, we usually talk about requests per second (RPS) and latency. The higher the RPS, the better. Latency is a bit more complex to figure out because when we increase the load on the server, the queues get longer and the latency increases. This would force us to limit the load on the server to compare latencies between different clients and scenarios. In addition, when using batches, it’s less clear what we mean by latency. Do we want to measure the latency of a single request or the entire batch? Therefore, for the simplicity of this post, we use only the RPS metric as an indicator of performance.

CPU Utilization

Our recommendation is to monitor the client and server CPU utilization. High CPU utilization can indicate suboptimal performance. CPU is measured in percent per core; for example, a CPU utilization of 250% is equivalent to a CPU utilization of two and half cores.

When sending commands in batches, we increase performance by reducing the number of system calls and CPU usage both on the client machine and on the server side.

Network bottlenecks

In this configuration, when using a single connection, the typical latency for one command is 0.2 millisecond. To measure the latency, we ran redis-benchmark -t ping -c 1 -h <host name>. The following table depicts the latency results in milliseconds.

Latency Summary
avg min p50 p95 p99
0.201 0.152 0.167 0.303 0.327

Therefore, when sending commands through a single connection with a synchronous API, the best possible throughput (using this configuration) is approximately 5,000 RPS (1 second / 0.2 milliseconds).

Both Redis and all client libraries are capable of operating at much higher speeds than the ones displayed in the preceding table; the speed of the network here limits performance. Therefore, when issuing commands through a single connection with a synchronous API, all clients achieve a throughput of approximately 5,000 RPS.

Reusing connections

Don’t open a new connection for every command. This is a common mistake. A connection should be reused throughout the program.

You don’t need to open a new connection for every command (or every few commands sent). Initializing a TCP connection is an extremely slow operation and doing it frequently has drastic effects on performance.

We tested the performance of opening a new connection for every command. The following table summarizes our results.

Reused Connection New Connection for Every Command
5,000 RPS 250 RPS

Redis clients

In the following sections, we review the best practices for performance in redis-py, stackExchange.redis, node-redis, PHPredis, predis, Jedis, and Lettuce.


Redis-py is a popular Redis client library for Python. This post works with redis-py version 3.5.3. The documentation and code for redis-py is available on GitHub.

Make sure you install the hiredis module in addition to the redis module:

pip install hiredis

From the section on parses in the redis-py README:

“Hiredis is a C library maintained by the core Redis team. Pieter Noordhuis was kind enough to create Python bindings. Using Hiredis can provide up to a 10x speed improvement in parsing responses from the Redis server. The performance increase is most noticeable when retrieving many pieces of data, such as from LRANGE or SMEMBERS operations.”

The Hiredis parser increased performance by approximately 10% when using the workload defined in the test setup section. The following table summarizes the effect on performance:

Pipeline Size With Hiredis (RPS) Without Hiredis (RPS)
3 about 10,500 about 10,500
10 24,844 22,858
100 61,627 53,924
1,000 74,277 64,871


Batching in redis-py is achieved using a Pipeline object. A Pipeline object in redis-py buffers commands on the client side and flushes them to the server only after the Pipeline.execute method is called.

By default, Pipeline.execute wraps commands in a MULTI/EXEC block. This hurts performance and can be disabled if not required. To disable, set transactions = False when creating a Pipeline object as follows:

client = redis.redis()
pipe = client.pipeline(transactions = False)

For an example on how to use batching in redis-py, see Pipelines in the GitHub repo.

The following table summarizes the performance of pipelines with a single connection (in RPS).

Pipeline Size Default Behavior (Transaction) Transactions = False Percent Improvement
10 24,844 26,039 5%
100 61,627 65,973 7%
1,000 74,277 85,646 15%

Connection pools

Because of the global interpreter lock (GIL) that restricts every Python program to run only one thread at a given time, the Python programming language is less than ideal for writing multithreaded code. Nevertheless, behind the scenes, redis-py manages a connection pool to be shared between threads. Even if only one thread is running at any given time, multithreaded code can still have a positive impact on performance. This is because when one thread is waiting for a response from Redis, a preempted context switch can allow a different thread to send an additional request.

For more information, see Connection Pools in the GitHub repo.

By default, redis-py initializes an unbounded and thread-safe connection pool. By unbounded, we mean that by default there is no limit on the number of newly created connections and therefore the size of the connection pool. By thread-safe, we mean that user doesn’t need to use any locking mechanisms when accessing the thread pool. Connections are only closed when the entire program terminates. Upon every command (all commands issued using pipeline.execute are treated as one command), the running thread attempts to take a connection from the pool, issues the command, and upon receiving the response from the Redis server, returns the connection to the pool so that it can be used by other threads.

Initially, the pool is empty. If a client attempts to take a connection from an empty pool (all connections are in use or none have been created), the client creates a new connection. When the client is finished using the newly created connection, it places it in the pool exactly as it would have done if it had initially taken it from the pool. The following table summarizes the performance of the multithreaded approach using the default connection pool.

Threads RPS CPU Utilization
2 8,029 52%
3 10,875 95%
between 4 and 100 about 11,000 162%


StackExchange.Redis is a popular Redis client for .NET languages. This post works with StackExchange.Redis version 2.2.50. The documentation and code are available on GitHub.

StackExchange.Redis offers both a synchronous and asynchronous API. The asynchronous API offers better performance and uses the task-based asynchronous pattern (TAP).


Dividing the workload between several threads has the potential of increasing performance.
For this, StackExchange.Redis offers ConnectionMultiplexer, a single thread-safe connection that StackExchange.Redis manages asynchronously. Different threads should share the same ConnectionMultiplexer. Because it’s thread-safe, no user-defined locking mechanisms are required when sharing it between threads.

Some Redis commands such as BLPOP and BRPOP block the connection from which they’re sent until certain criteria is met. Because multiple threads using StackExchange.Redis all access the Redis server using a single ConnectionMultiplexer, connection-blocking commands may block the ConnectionMultiplexer indefinitely. Therefore, StackExchange.Redis does not support such connection-blocking commands. For more information about ConnectionMultiplexer, see Basic Usage.

The following table summarizes the performance we measured using the synchronous API in multithreaded scenarios.

Threads RPS CPU Utilization
1 4,000 130%
10 28,701 420%
20 35,247 420%
30 34,112 470%
40 34,314 480%
50 34,016 480%
60 33,564 540%

We see that multithreading can give a significant performance boost; 20 threads gave more than 8 times the RPS of using a single thread. On the other hand, using more than 20 threads significantly increases CPU utilization does not improve performance. Although using multiple threads with the synchronous API gives a performance boost, we recommend using the asynchronous API as described in the sections below.


Pipelining in StackExchange.Redis is done by sending a sequence of commands asynchronously and waiting for the corresponding tasks to complete. For example, a pipeline of size 2:

var task1 = db.StringGetAsync("a");
var task2 = db.StringGetAsync("b");
var result1 = db.Wait(task1);
var result2 = db.Wait(task2);

When pipelining in StackExchange.Redis, you must use the asynchronous API.

We tested the performance of a variable number of pipeline sizes using a single thread. The following table summarizes our results.

Pipeline Size RPS CPU Utilization
3 10,676 140%
10 30,175 165%
100 183,685 300%
1,000 406,297 400%


StackExchange.Redis also supports batching. This is slightly different than pipelining in that commands are first buffered in client memory and then sent to the Redis server in one batch by calling batch.Execute().

In StackExchange.Redis, batched commands are not wrapped in a Multi/Exec block and are therefore not guaranteed to run atomically.

For example, a batch of size 2:

IBatch batch = db.CreateBatch();
task1 = batch.StringGetAsync("a");
task2 = batch.StringGetAsync("b");
result1 = db.Wait(task1);
result2 = db.Wait(task2);

We tested the performance of a few different batch sizes. The following table summarizes our results.

Batch Size RPS CPU Utilization
3 10,673 130%
10 29,580 140%
100 184,033 200%
1,000 410,896 290%

We recommend batching over pipelining when possible. Batching has the advantage of making fewer I/O calls and fewer context switches giving us similar RPS for drastically lower CPU utilization. As can be seen from the tables above, we see that both pipelines and batches of size 100 give us an RPS of about 180000, but with pipelines, the CPU utilization is at 300% whereas with batches, it is about 200%.


Node-redis is a popular redis client for Node.js. This post works with node-redis version 3.1. The documentation and code are available on GitHub.

Because it’s written in Node.js, node-redis only offers an asynchronous API.

Unbounded concurrency

It can be difficult to bound the amount of concurrency when writing async code. Unbounded concurrency can lead to over-consumption of resources. For example, the following code may exhaust the heap, causing the program to crash:

for (let i = 0; i < 3000000; i++) {
	client.set(“key”, “value”, () => {} );

Each call to client.set allocates memory on the heap. The allocated memory can only be freed after the callback stops. Therefore, the allocated memory piles up until the heap runs out of memory and causes the program to crash.

For more information on memory leak using node-redis, see Node.js + Redis memory leak.

Bounding concurrency

One way of bounding the amount of concurrency is by limiting the initial amount of asynchronous function calls and only issuing subsequent commands via callbacks. For example, to cap the amount of concurrency of 3,000,000 requests at 1,000 concurrent requests:

let numberOfRequests = 0;
function callback() {
 	if (numberOfRequests < 3000000) {
		client.set(“key”, “value”, callback);
for (let i = 0; i < 1000; i++) {
	client.set(“key”, “value”, callback);

This limits the amount of incomplete commands to at most 1,000 at any given time. The following table summarizes performance with bounded concurrency.

Concurrency Bound RPS CPU Utilization
1 4,589 3%
2 8,932 5%
10 39,467 20%
100 181,587 75%
1,000 208,739 100%


Node-redis (v3.1) also supports batching. Batches are not transactions in node-redis.

Batches are sent via a batch object, which buffers commands until the batch.exec method is called, after which it sends all of the buffered commands. For example, a batch of size 2:

let batch = client.batch()
batch.set(key, value, callback)
batch.get(key, callback)

Note: node-redis v4 doesn’t have the client.batch command anymore, but it still supports batching. Refer to How are commands batched? to learn more.

We tested the performance of batches, sending the next batch after all the callbacks from the previous batch completed running. The following table summarizes batching performance.

Batch Size RPS CPU Utilization
2 8,911 3%
3 12,696 3%
10 36,336 5%
100 147,783 23%
1,000 311,655 75%

We recommend batching when possible. Batching uses less CPU by making fewer I/O calls and fewer context switches. As seen in the first table, a concurrency bound of 1000 requests gives a performance of about 200,000 RPS at 100% CPU. In this case, the CPU is the bottleneck limiting performance. Batching on the other hand uses less CPU and does not reach the CPU bottleneck; hence it is able to get to 300,000 RPS.

Predis vs. phpredis

The Redis open-source community recommends two popular PHP clients: predis and phpredis. Predis is written in PHP and therefore slower than phpredis, which is an extension written in C. This post works with predis version 1.1.9 and phpredis version 5.3.4.

Both predis and phpredis offer synchronous APIs.


Both predis and phpredis support batching via a pipeline object, which buffers a sequence of commands and sends it to the server after all commands have been buffered. The syntax varies slightly between these two clients, but the idea is the same.

We tested the performance of these two clients using a variable number of pipeline lengths. The following table summarizes the performance comparison.

Pipeline Size Predis phpredis
1 Approximately 5,000 RPS, 16% CPU utilization Approximately 5,000 RPS, 10% CPU utilization
2 5,373 RPS, 16% CPU utilization 9,126 RPS, 10% CPU utilization
3 7,865 RPS, 20% CPU utilization 13,431 RPS, 13% CPU utilization
10 20,985 RPS, 25% CPU utilization 35,867 RPS, 20% CPU utilization
20 32,902 RPS, 50% CPU utilization 56,682 RPS, 28% CPU utilization
50 52,241 RPS, 71% CPU utilization 87,417 RPS, 40% CPU utilization
100 67,913 RPS, 91% CPU utilization 115,041 RPS, 48% CPU utilization
1,000 75,206 RPS, 100% CPU utilization 161,303 RPS, 75% CPU utilization


Lettuce is a popular Redis client for the Java programming language. This post works with Lettuce version 6.0.2. The documentation and code is available on GitHub.

Lettuce offers both synchronous and asynchronous APIs. In Lettuce, asynchronous methods return Lettuce futures, which are a handle on Lettuce asynchronous function calls. Among other things, you can use Lettuce futures to wait for asynchronous function calls to complete.


You can implement pipelines in Lettuce in several ways. You can achieve pipelines by asynchronously sending a sequence of commands and waiting for the corresponding futures to complete only after the entire sequence has been sent. For example, a pipeline of size 3:

RedisAsyncCommands<String, String> commands = connection.async();
futures.add(commands.set(key, value);
LettuceFutures.awaitAll(5, TimeUnit.SECONDS,futures.toArray(new RedisFuture[futures.size()]));


Lettuce also supports batching. Batching is achieved by setting AutoFlushCommands to false, which causes commands to be buffered instead of being immediately flushed, and calling flushCommands to empty the buffer and send the commands as a single batch. For example, a batch of size 3:

RedisAsyncCommands<String, String> commands = connection.async();
commands.setAutoFlushCommands(false); // for batching
futures.add(commands.set(key, value);
commands.flushCommands(); // for batching
LettuceFutures.awaitAll(5, TimeUnit.SECONDS,futures.toArray(new RedisFuture[futures.size()]));

For more information about pipelining and batching in Lettuce, see Pipelining and command flushing.

The following table compares the performance of pipelines vs. batching with a single connection (in RPS).

Size Pipeline (RPS) Batching (RPS)
5 28,412 RPS, 37% CPU utilization 34,208 RPS, 36% CPU utilization
10 46,896 RPS, 53% CPU utilization 53,918 RPS, 46% CPU utilization
20 82,228 RPS, 81% CPU utilization 78,931 RPS, 45% CPU utilization
50 135,264 RPS, 105% CPU utilization 115,775 RPS, 50% CPU utilization
100 170,867 RPS, 135% CPU utilization 149,172 RPS, 60% CPU utilization
200 189,636 RPS, 145% CPU utilization 175,913 RPS, 76% CPU utilization
500 239,079 RPS, 160% CPU utilization 188,090 RPS, 81% CPU utilization
1,000 231,085 RPS, 160% CPU utilization 215,157 RPS, 93% CPU utilization

Batching makes fewer I/O calls and causes fewer context switches than pipelining. Hence it uses less CPU than pipelining. We recommend batching over pipelining if your application can tolerate a slightly higher latency. As can be seen from the table above, pipelining can achieve a higher RPS but at the cost of almost doubling the CPU utilization. For example, a pipeline of size 1000 has a CPU utilization of about 160% whereas a batch of size 1000 has a CPU utilization of about 93%. If your application has CPU resources to spare then consider pipelining.

Connection pools

Lettuce supports using multiple connections via a connection pool. Lettuce is built on top of the Netty framework, which is a multi-threaded, event-driven I/O framework (the connections are processed by several threads).

Lettuce uses the Apache Commons-pool2 GenericObjectPool, which we discuss in more detail later in this post.

To initialize a connection pool:

GenericObjectPool<StatefulRedisConnection<String, String>> pool = 
        () -> redis.connect(),
        new GenericObjectPoolConfig());

and borrow a connection from the pool:

StatefulRedisConnection<String, String> connection = pool.borrowObject();
// do something useful with the connection

For more information on connection pools in Lettuce, see Connection Pooling.

We tested the performance of various connection pool sizes in Lettuce in multithreaded scenarios measured in RPS. The following table summarizes our results:

# of Threads Connection Pool MaxTotal RPS
1 10 5,163
1 200 5,684
10 10 46,976
10 200 46,196
20 10 45,181
20 200 85,789
50 10 45,543
50 200 180,143
100 10 46,062
100 200 244,032
1,000 10 45,691
1,000 200 141,035
1.000 1,000 223,163

We found that having more threads than connections in the pool has a negative impact on performance. Nevertheless, the Redis server has a limit on the number of open connections it can handle, and having too many open connections decreases performance on the server side. We explain how to best configure connection pools in the Using GenericObjectPool section.


Jedis is a popular client library for the Java programming language. This post works with Jedis version 3.6.0. The documentation and code are available on GitHub.

Jedis only offers a synchronous API. To send several requests concurrently, we can either use batching or multithreading.


Jedis supports batching via a Pipeline object. The Pipeline object buffers commands on the client side and sends them as a single batch after the Pipeline.sync method is called. For example:

Pipeline p = jedis.pipelined();
p.set(KEY1, VALUE1);
p.set(KEY2, VALUE2);

The following table summarizes our batching performance results:

Batch Size RPS
5 39,004 RPS, 15% CPU utilization
10 67,725 RPS, 18% CPU utilization
20 101,747 RPS, 22% CPU utilization
50 157,015 RPS, 29% CPU utilization
100 222,373 RPS, 35% CPU utilization
200 285,389 RPS, 38% CPU utilization
500 328,210 RPS, 52% CPU utilization
1,000 380,912 RPS, 63% CPU utilization

For more information on batching in Jedis, see Pipelining in the GitHub repo.


Jedis also supports batching for transactions. This is done through a multi object, which like the Pipeline object buffers command on the client side. The buffered commands are sent to the Redis server as a single batch after the exec method is called. Batches that are sent via the multi object are wrapped in a MULTI/EXEC block, and therefore are run atomically by the Redis server.

For example:

Transaction t = jedis.multi();

The following table summarizes the performance of batched transactions.

Transaction Size RPS
5 25,296 RPS, 13 CPU% utilization
10 36,739 RPS, 15 CPU% utilization
20 57,763 RPS, 17 CPU% utilization
50 94,972 RPS, 20 CPU% utilization
100 140,527 RPS, 26 CPU% utilization
200 174,067 RPS, 29 CPU% utilization
500 200,817 RPS, 32 CPU% utilization
1,000 226,517 RPS, 38 CPU% utilization

Because atomicity takes a toll on performance, we recommend avoiding transactions when atomicity isn’t required.

For more information on transactions in Jedis, see Transactions.

Jedis vs. Lettuce

The following table compares Jedis and Lettuce performance.

Size Batching
Jedis Lettuce
5 39,004 34,208
10 67,725 53,918
20 101,747 78,931
50 157,015 115,775
100 222,373 149,172
200 285,389 175,913
500 328,210 188,090
1,000 380,912 215,157

In our experience Jedis is up to twice as fast as Lettuce.

Using GenericObjectPool

When working with GenericObjectPool, consider the following:

  • maxTotal – The maximum number of connections allowed in the pool (default is 8).
  • maxIdle – The maximum number of idle connections allowed in the pool (default is 8).

If your workload is consistent over time, we recommend setting maxTotal = maxIdle to prevent closing connections unnecessarily (we did so in our tests).

Although creating new connections is expensive and should be avoided, if you expect to have short and intense peaks in the usage of the pool’s resources, we recommend setting maxIdle lower than maxTotal in order to reduce the resource consumption on both the server and client side. Using this configuration impacts latency. The assumption here is that usage of concurrent connections above maxIdle is not common.


In this blog post, we shared best practices for optimizing the performance of Redis clients. We explored the performance of synchronous and asynchronous APIs, and discussed different methods of pipelining and methods of sharing connections between threads.

For all of the Redis clients that we investigated, we found that the use of batching, pipelining, multithreading, connection pooling and transactions can increase end-to-end performance (RPS) by a significant factor. In some cases, a system can run as much as 5 times faster.

To learn more about best practices for configuring Redis clients in environments with many connections, see Best practices: Redis clients and Amazon ElastiCache for Redis.

Please ask any questions you may have, and let us know what performance you achieve in the comments.

About the Authors

Adi Emanuel Pinsky is a Software Development Engineer at Amazon ElastiCache, based in Tel Aviv, Israel. He is a graduate of a dual degree in Mathematics and Computer Science from the Technion – Israel Institute of Technology. He enjoys learning and spends way to much time watching video lectures. When he is not working Adi enjoys sports, eating great food and spending time with friends and family.

Barak Gilboa is a SDE at Amazon ElastiCache, based in Tel Aviv, Israel. He works to have a better user experience for the customer. Outside of work he loves to spend time with his family, reading books and long distance running.

Asaf Porat Stoler​ is a Software Development Manager at Amazon ElastiCache, based in Tel Aviv, Israel. He has vast and diverse experience in storage systems, data reduction, and in-memory databases, and likes performance and resource ​optimizations. Outside of work he enjoys sport, hiking, and spending time with his family.

Tzach Kaufmann is a Principal Product Manager for Amazon ElastiCache in the In-Memory Databases team at Amazon Web Services based in Israel. When not in front of the computer he loves to spend time with his family, hike, ride bicycles and sports.