[PATCH 0/2] x86: Remove ideal_nops[]
From: Peter Zijlstra
Date: Fri Mar 12 2021 - 07:00:46 EST
Hi!
A while ago Steve complained about x86 being weird for having different NOPs [1]
Having cursed the same thing before, I figured it was time to look at the NOP
situation.
32bit simply isn't a performance target anymore, so all we need is a set of
NOPs that works on all.
x86_64 has two main NOP variants, NOPL and prefix NOP. NOPL was introduced by
P6 and is architecturally mandated for x86_64. However, some uarchs made the
choice to limit NOPL decoding to a single port, which obviously limits NOPL
throughput. Other uarchs have (severe) decoding penalties for excessive (>~3)
prefixes, hobbling prefix NOP throughput.
But the thing is, all the modern uarchs can handle both without issue; that is
AMD K10 (2007) and later and Intel Ivy Bridge (2012) and later. The only
exception is Atom, which has the prefix penalty.
Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
simply irrelevant today, remove variable NOPs and use NOPL.
This gives us deterministic NOPs and restores sanity.
[1] https://lkml.kernel.org/r/20210302105827.3403656c@xxxxxxxxxxxxxxxxxx