VMware: 70% Drop in Linux ESXi VM Performance with Retbleed Fixes
VMware is warning that ESXi VMs running on Linux kernel 5.19 can have up to a 70% performance drop when Retbleed mitigations are enabled compared to the Linux kernel 5.18 release.
More specifically, the VMware performance team noticed regressions on ESXi virtual machines of up to 70% in computing, 30% in networking, and 13% in storage.
From VMware’s testing, it became clear that the sudden and very significant degradation was caused by the introduction of mitigations for the “Retbleed” vulnerability.
“After performing the bisect between kernel 5.18 and 5.19, we identified the root cause to be the enablement of IBRS mitigation for spectre_v2 vulnerability by commit 6ad0ad2bf8a6 (“x86/bugs: Report Intel retbleed vulnerability”),” explains VMware.
VMware found that disabling the Retbleed security mitigation via the “
spectre_v2=off” kernel boot parameter restored the Linux VM’s performance to the levels of the 5.18 release, confirming that the fixes are the sole reason behind the drop in performance.
However, disabling mitigations would be considered a security risk, as the systems would be vulnerable to cyber-attacks on certain CPU models.
A costly trade-off
Retbleed is a speculative execution attack discovered in July 2022 that can leverage return instructions in the CPU to extract sensitive information.
Examples of data that Retbleed can leak include items contained in kernel memory, such as root password hashes, as illustrated by the video below.
Speculative execution is a performance-enhancing feature on modern processors, allowing CPUs to perform computations before they are requested, reducing the time needed for their completion.
This performance-boosting feature has negative consequences from the perspective of security because it makes side-channel attacks possible.
A notable case of this is “Spectre,” which was mitigated with the “Retpoline” fix, a software-based solution that had minimal performance impact.
Retbleed is, in fact, not only a bypass of the Retpoline fix, but an abuser of the mitigation, targeting the return operations to inject branch targets in the kernel address space.
Unfortunately, Linux’s mitigation of Retbleed on kernel version 5.19 has had a detrimental effect on performance, which could result in a wide range of business issues on production systems and cloud infrastructure.
Retbleed impacts Intel Core CPUs from generation 6 (Skylake – 2015) through 8 (Coffee Lake – 2017) and AMD Zen 1, Zen 1+, and Zen 2 processors released between 2017 and 2019, which are still omnipresent in server systems.
With such a performance drop, many system administrators who believe Retbleed is more of a theoretical rather than a real threat to their systems will be open to taking the trade-off by disabling the mitigations.
For now, the Linux kernel development team hasn’t discussed the massive performance impact nor promised to revisit the mitigations and implement a more “surgical” fix, so the situation remains risky.