Re: [PATCH 09/11] x86: rework CONFIG_GENERIC_CPU compiler flags

From: Arnd Bergmann
Date: Wed Dec 04 2024 - 15:22:04 EST


On Wed, Dec 4, 2024, at 16:36, Tor Vic wrote:
> On 12/4/24 11:30, Arnd Bergmann wrote:
> Similar but not identical changes have been proposed in the past several
> times like e.g. in 1, 2 and likely even more often.
>
> Your solution seems to be much cleaner, I like it.

Thanks. It looks like the other two did not actually
address the bug I'm fixing in my version.

> That said, on my Skylake platform, there is no difference between
> -march=x86-64 and -march=x86-64-v3 in terms of kernel binary size or
> performance.
> I think Boris also said that these settings make no real difference on
> code generation.

As Nathan pointed out, I had a typo in my patch, so the
options didn't actually do anything at all. I fixed it now
and did a 'defconfig' test build with all three:

> Other settings might make a small difference (numbers are from 2023):
> -generic: 85.089.784 bytes
> -core2: 85.139.932 bytes
> -march=skylake: 85.017.808 bytes


text data bss dec hex filename
26664466 10806622 1490948 38962036 2528374 obj-x86/vmlinux-v1
26664466 10806622 1490948 38962036 2528374 obj-x86/vmlinux-v2
26662504 10806654 1490948 38960106 2527bea obj-x86/vmlinux-v3

which is a tiny 2KB saved between v2 and v3. I looked at
the object code and found that the v3 version takes advantage
of the BMI extension, which makes perfect sense. Not sure
if it has any real performance benefits.

Between v1 and v2, there is a chance to turn things like
system_has_cmpxchg128() into a constant on v2 and higher.

The v4 version is meaningless in practice since it only
adds AVX512 instructions that are only present in very
few CPUs and not that useful inside the kernel side from
specialized crypto and raid helpers.

Arnd