AWS Startups Blog

Achieve Better Price to Performance for TiDB on Amazon EKS with Graviton2 Processors

Guest post by Ron Xing, Technical Support Engineer, PingCAP, and Yudho Ahmad Diponegoro, Startup Solutions Architect, AWS

Introduction

Many startups require a scalable database system to handle high volume transactions and real-time analytics. TiDB is an open-source, cloud-native distributed SQL database that is ACID-compliant and strongly consistent. Distributed SQL databases like TiDB aim to combine the best features of both Relational Database Management Systems (RDBMSs) and NoSQL databases to create a truly cloud-native database. TiDB is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

TiDB is developed and maintained by PingCAP for its growing user communities, including startups and digital native businesses faced with rapid growth. They often deploy TiDB in Kubernetes. As a container orchestration platform, Kubernetes enables businesses to simplify deployment, streamline workload management, and automate operation and maintenance by utilizing its auto scaling and auto failover capabilities. Kubernetes provides an advanced framework to run distributed systems such as TiDB with high resilience for mission critical business functions. By utilizing the TiDB Operator, users can easily bring up and manage TiDB on Kubernetes.

In 2020, Amazon Elastic Kubernetes Service (EKS) on AWS Graviton2 was made generally available. Amazon EKS provides the flexibility to start, run, and scale Kubernetes applications in AWS. The Graviton2 is a custom built processor by AWS using 64-bit Arm Neoverse N1 cores. Graviton2 powers Amazon EC2 T4g, M6g, C6g, and R6g instances, and their variants, that provide up to 40% better price performance over comparable current generation x86-based instances. This launch means that users can now run Kubernetes on AWS with lower price while having equal or better performance.

For startups, being able to save infrastructure cost while improving database performance and automating data layer operations can be crucial. Startups can then shift the cost savings for value innovations while at the same time improve their customer experience. While the value of running on Kubernetes is clear, some claim that this can be a costly affair for customers. In this post, we will share findings from a benchmark we conducted to compare price-performance ratio when running TiDB on EKS with Graviton2 (Arm) and on the Intel Xeon Platinum 8000 series (x86).

Benchmarking

Methodology

Our test used two industry standard OLTP benchmarks: TPC-C and sysbench. TPC-C tests the OLTP system by using a commodity sales model that involves five different transaction types. TPC-C generally works for any database which handles OLTP workloads. Sysbench is a well-established tool that runs synthetic benchmarks of MySQL and the hardware it runs on. It also has an option to execute OLTP workloads on a MySQL database. Since TiDB is MySQL compatible, sysbench will be a good reference as well.

Testing environment

In our benchmark, we are deploying TiDB on Amazon’s EKS because many users choose EKS as their managed Kubernetes solution to reduce the maintenance overhead of underlying infrastructure. The detailed topology and software information are listed below.

Topology

The diagram below shows the topology of the TiDB cluster.

Architecture Diagram of TiDB cluster on EKS

Figure 1. Architecture Diagram of TiDB cluster on EKS

TiDB cluster has three main components: TiDB server, TiKV server, and PD server.

TiDB server is the stateless SQL layer that’s compatible with MySQL. It does not store data and is only for computing and SQL analyzing, transmitting actual data read requests to TiKV nodes. That’s why we choose the c6g.2xlarge and c5.2xlarge EC2 instances – they are compute optimized machines.

TiKV server is responsible for storing data. TiKV is a distributed transactional key-value storage engine. Data is distributed across all the containers using Amazon Elastic Book Store (Amazon EBS) as the backend provisioner in EKS. Since TiKV has a large number of data processing operations (like table scan) which need to cache the data in memory, we select memory optimized instances r6g.2xlarge and r5.2xlarge.

PD server manages the cluster’s metadata. It stores the metadata of real-time data distribution of every TiKV node and the topology structure of the entire TiDB cluster. Data is stored in Amazon EBS. It uses minimal computing resources; therefore we use c6g.large and c5.large.

The admin node is required as part of the EKS deployment. To fully utilize the resources, the monitor is deployed on the admin node.

In the diagrams above, all database configurations use default values. Depending on the workload and system, performance tuning can be done on different levels. You can learn more about two types of tuning here:

●      TiDB Memory Tuning

●      TiKV Performance Tuning with Massive Regions

In an actual production environment, multiple Placement Driver (PD) nodes are required for High Availability purposes. All components in the cluster can be easily scaled in and scaled out by editing the TiDB cluster Custom Resource (CR) YAML file.

The tests used two EKS clusters with the following processing types and configurations:

Pingcap processes and service types

Storage

The following storage configurations apply to both processors:

Pingcap storage configuration table with service storage size iops throughput and instance count

Amazon Elastic Block Store (EBS) gp3 volume, which is a new type of SSD EBS volume that lets you provision performance independent of storage capacity, can offer a 20% lower price than existing gp2 volume types.

Software version

The software versions of the TiDB cluster and the benchmarking tools are listed below:

Table of pingcap benchmarking tools

Cost

In our testing, we made the following assumptions about cost:

●      All cost calculations are based on the on-demand rate for instances in the Asia Pacific (Singapore) region in US dollars (USD) per month. If you are interested in the cost in the US region, please refer to the benchmark report. All monthly calculations are based on 730 hours of usage per month.

●      Storage cost includes a daily snapshot.

The table below summarizes the cost breakdown per component and in total:

Pingcap cost breakdown table with storage and EKS costs

For detailed cost breakdown, please refer to the benchmark report.

bar graph visualization of Cost Breakdown for TiDB cluster on EKS

Figure 2. Cost Breakdown for TiDB cluster on EKS

TPC-C benchmark

As described by Transaction Performance Council (TPC), TPC-C is a complex OLTP benchmark which involves a mix of five concurrent transactions of different types and complexities, either executed online or queued for execution later. The database consists of nine types of tables with a wide range of record and population sizes. TPC-C is measured in transactions per minute (tpmC). The results from the benchmark are preliminary and should not be considered as an official TPC-C result.

For step-by-step test procedures, please refer to the benchmark report.

TPC-C workloads

TPC-C includes the following transaction workloads:

●      New Order: simulates submitting a new order through a single database transaction. This forms the backbone of the workload. It has high frequency and low latency requirements.

●      Payment: updates the customer’s balance.

●      Order Status: queries the status of the last order.

●      Delivery: processes a batch of new orders which are not yet delivered.

●      Stock Level: checks the stock level of the item being sold to make sure it will be restocked.

The table below expands the detail on terms “Large1” and “Large2” workloads which are used in this benchmark. Each represents a different number of simulated warehouses in the database which affects the data size.

Table that defines what large designations each warehouse size has

In the future, for large workloads over 1TB, we will test using storage with higher performance to avoid any potential bottlenecks from the disk I/O. Stay tuned for future benchmarks.

Benchmark results

We run the TPC-C test for different number of threads. (We used 150, 300, 500, 800, and 1,000). As for configuring the number of threads, we modified the –threads flag of the TPC-C command. ENI and network latencies were consistent between each test run. For detailed results for various concurrencies under different workloads as well as how we modified the number of threads, please refer to the benchmark report.

Following table summarizes the performance(tpmC), cost, and price-performance under different workloads for both Arm (Graviton2) and x86 processors.

Table that summarizes cost performance with Graviton 2 processors

*The value is derived from the average tpmC among 300, 500, and 800 threads.

**The total system cost reflects the estimated five year hardware cost based on the AWS Asia Pacific (Singapore) rate.

***Price-performance compares x86 and Graviton2 Arm processors. A lower number is better. It indicates a lower cost for more performance.

bar graph visualization of TPC-C Price-performance Ratio

Figure 3.TPC-C Price-performance Ratio

Comparing absolute tpmC performance under each workload, the Arm-based system and the x86-based system show an average difference around 5% to 18% with better performance for Graviton2 Arm.

When the workload increases, tpmC increases significantly. However, the improvement depends heavily on the utilization of the compute and storage resources.

After factoring in the compute resource cost, the price-performance ratio for Graviton2 Arm is up to 25% lower than x86. These are unofficial TPC-C results; thus, they are not audited tpmC numbers. It is always good to benchmark on your representative workloads for accurate understanding.

Sysbench benchmark

Sysbench is one of the most popular open source benchmark tools to test database systems. It provides statistics including workload, queries per second (QPS), transactions per second (TPS), and latency. We will be using oltp_read_write.lua to test the performance for the OLTP workload which typically consists of both read and write operations. The read/write ratio may vary based on different use cases. In this test, we are using a default ratio which is 75% read and 25% write. You may adjust the ratio to simulate your own workload. For step-by-step test procedures, please refer to the benchmark report.

The read/write workload split information is listed below.

Workload

●      Read (75%) and write (25%)

●      Tables: 16

●      Table size: 10 M rows per table

●      Data size: ~100 GB

Benchmark results

For detailed benchmark results consisting of workload, latency, QPS, TPS. Please refer to the benchmark report.

Following table summarizes the performance (TPS), cost and price-performance for both Arm (Graviton2) and x86 processors:

table of summary of TPS performance

*The value is derived from the average TPS among 300, 600, and 900 threads.

**The total system cost reflects the estimated five year hardware cost based on the AWS Asia Pacific (Singapore) rate.

***Price-performance compares x86 and Graviton2 Arm processors. A lower number is better. It indicates a lower cost for more performance.

Bar graph visualization of Sysbench Price-performance Ratio

Sysbench Price-performance Ratio

Comparing absolute TPS performance under a 100 GB workload, the difference between the Arm-based system and the x86-based system is around 5%–10%.

After factoring the compute resource cost, the price-performance ratio for Arm (Graviton2) is 20.77% lower than x86.

Conclusion

Benchmarking results from both TPC-C and sysbench show that the Graviton2 processor outperforms the x86 processor for TiDB workload by 5% to 18% depending on the workload and concurrency. After factoring in the hardware cost, Graviton2 processor has a better price-performance ratio than the x86—up to 25% better.

One of the limitations we have is that the software images we use are not yet fully optimized for Graviton2 processors. We believe that if the binaries are compiled with Neoverse specific switches, TiDB on Graviton2 should better outperform that on x86 in terms of tpmC or TPS. Therefore, the price-performance ratio may be further improved. We will be updating this blog post once we confirm this in future tests with more complex workloads.

Also, by using TiDB Operator, we managed to run database workloads on Kubernetes. It improves productivity by managing large and complex clusters, optimizes operations, and shortens go-to-market (GTM) time. If you need to deploy other applications in the same Graviton2 powered EKS cluster as TiDB, make sure that the application images are properly tested on Arm machines to achieve desirable performance.

Learn more about AWS Graviton2-based instances and explore the Graviton2 and containers workshop.