Improving performance of PHP for Arm64 and impact on AWS Graviton2 based EC2 instances
This post is contributed by Sebastian Pop, Senior Software Engineer, Amazon Web Services.
AWS recently launched the Amazon EC2 M6g, C6g, and R6g instances powered by the AWS Graviton2 processors. Compared to similar-sized M5 instances, Graviton2 based instances provide up to 40% better price performance on several open-source application stacks.
This blog discusses how AWS worked together with the PHP community to drive major improvements to the performance of the PHP software stack on the Graviton2-based instances. By using AWS Graviton2 based instances, the latest release of PHP-7.4 currently experiences up to 37% faster execution time compared to the previous release PHP-7.3. This significantly lowers the cost of running PHP software such as WordPress on Amazon EC2 M6g instances. In this blog, we use the example of M6g instances for a general purpose workload such as WorkPress, but the PHP optimizations discussed here apply equally to compute-optimized and memory-optimized workloads running on C6g and R6g instances.
A better Zend optimizer for Arm64
Zend optimizer is a component of the PHP runtime system that improves performance by up to 30% on a range of Zend micro-benchmarks. Before PHP 7.4 the Zend optimizer was not enabled for Arm. AWS added Arm64-specific implementations to several functions of the PHP interpreter.
AWS evaluated the performance impact of the software changes to the PHP interpreter by running the standard micro-benchmarks distributed with the Zend optimizer: bench.php and micro_bench.php. To look at the performance of PHP interpreter as used in a fully loaded server, the benchmarks are set to run as many copies in parallel as vCPUs in the system, that is, each vCPU runs one copy of the benchmark such that the utilization of the system is above 95%. The experiment was run on a 16 vCPUs Graviton2 M6g.4xl and a 16 vCPUs Intel M5.4xl instances. Between PHP versions 7.3 and 7.4 the execution time of M6g instances improved by up to 37%. The following images demonstrate the benchmark results: the first image shows improved execution time of Zend/bench.php with newer versions of PHP, and the second image illustrates a faster execution time with newer versions of PHP on Zend/micro_bench.php.
Performance evaluation of WordPress on Ubuntu 19.04
While the synthetic benchmarks proved M6g’s superior performance, I also looked at WordPress, a PHP-based real-world application. The following image shows the setup of the experiment:
The machine represented in the middle of the image is the system under test (SUT) that is an M6g.4xl and an M5.4xl. To measure the performance of PHP independently of other components in the system, the MySQL database is set to run on a separate C5.4xl instance, in blue on the figure. The program WRK generates the HTTP requests for the main page of WordPress. WRK runs on a separate C5.4xl instance represented in orange on the figure. To minimize network noise, the three machines are allocated in the same cluster placement group. The SUT runs the NGINX web-server and the PHP interpreter. The NGINX web-server is configured with the fast CGI interface PHP-FPM. All the machines run Ubuntu 19.04.
With the software changes in PHP-7.4 and PHP-8, WordPress can serve up to 17% more pages per second on M6g.4xl vs. M5.4xl. Combined with the 20% lower cost of M6g instances, running PHP-7.4 and WordPress provides up to 34% better price/performance on M6g instances vs. M5 instances. The following image illustrates the performance of PHP versions 7.3, 7.4, and 8 when running on M6g.4xl and M5.4xl instances:
Looking at the scalability of WordPress-NGINX benchmark on the M6g instances, and a better performing PHP-7.4 release, adding more vCPUs scales almost linearly in the number of served pages until eight vCPUs. However, on the beta version of PHP-8 from August 20, 2019 that contains even more optimizations, the scalability trend continues up to 32 vCPUs (after which CPU utilization drops below 90%.)
On the low-cost end of the spectrum, AWS offers an M6g.medium instance with 1vCPU that can sustain up to 77 pages per second which may be interesting for small (cost-conscious) users of WordPress.
How to install PHP-7.4
Installation instructions: https://www.php.net/install
Installation instructions for Ubuntu: https://computingforgeeks.com/how-to-install-php-on-ubuntu/
AWS contributions to PHP-7.4
|Function||Speedup||Commits to PHP-7.4|
Patches impacting performance of WordPress benchmark
The 20% performance difference from PHP-7.4 and PHP-8 master is due to https://github.com/php/php-src/commit/682b54f68748715f85e9ac4a267477d9ac61918a that removes support for PHP-4 constructors deprecated in PHP-7.0 https://wiki.php.net/rfc/remove_php4_constructors.
As this has been committed early after the PHP-7.4 branch has been cut, the patch may be easily applied to the PHP-7.4 branch to get the performance benefits that we see in PHP-8 master.
There are two other patches that impact the performance of the WordPress-NGINX benchmark, and both have been applied to master and PHP-7.4 branch:
- A patch to disable the use of huge pages in memory allocator fixed a 10% performance regression: https://github.com/php/php-src/commit/928c42211f737640e4dc3c9702ba833c3059bddf
- The patch to enable the Zend Optimizer on ARM64 accounts for about 5% of better performance on M6g instances https://github.com/php/php-src/commit/4d7df449d0ab389b01b45fa1bb9bf2b4a8755545
PHP-8 plans to release in 2021 with more improvements for Arm64: an improved toupper/tolower function brings performance up by 16.5x. https://github.com/php/php-src/pull/4439
AWS has contributed changes to PCRE2 release 10.34. PCRE2 version 10.34 is used in PHP-8 to match regular expressions. PCRE2 accounted for about 8% of execution time in WordPress benchmark. The change contributed by AWS to PCRE2 vectorizes first character match and matching pairs of characters with NEON instructions: performance improves by up to 8x on M6g. https://lists.exim.org/lurker/message/20191106.052444.1ea1a176.en.html
PHP-8 plans to feature a new JIT compiler that optimizes the PHP byte-code in the Opcache. The JIT is currently developed on x86 and is based on Lua’s JIT. As Lua’s JIT supports ARM64, we work with PHP developers to enable and to tune the Opcache JIT in PHP-8 to get the best performance on AWS Graviton processors.
The interaction with PHP community was through early discussion with the maintainers of the PHP Zend optimizer on our plans to improve performance on ARM, patch reviews that improved the quality of our submissions, and follow-up patches from the community to address problems uncovered during review of our patches. AWS continues to contribute ARM performance improvements to PHP-8. We followed the same pattern of involvement with the PCRE community, and continue to work with other open-source communities to bring our expertise and knowledge in improving the performance and tuning of ARM-based AWS Graviton systems.
PHP release 7.4 is key to obtain maximum performance from the Graviton2-based M6g, C6g, and R6g instances. PHP 7.4 makes the AWS Graviton2 based instances even more appealing by delivering both higher performance and lower cost compared to M5 instances.
To get started, check out the New – EC2 M6g Instances, powered by AWS Graviton2, and please leave comments!