Re: [PATCH 09/11] x86: rework CONFIG_GENERIC_CPU compiler flags

From: Arnd Bergmann
Date: Thu Dec 05 2024 - 04:47:10 EST


On Thu, Dec 5, 2024, at 00:33, Linus Torvalds wrote:
> On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@xxxxxxxx> wrote:
>>
>> I guess the other side of it is that the current selection
>> between pentium4/core2/k8/bonnell/generic is not much better,
>> given that in practice nobody has any of the
>> pentium4/core2/k8/bonnell variants any more.
>
> So I suspect:
>
>> A more radical solution would be to just drop the entire
>> menu for 64-bit kernels and always default to "-march=x86_64
>> -mtune=generic" and 64 byte L1 cachelines.
>
> would actually be perfectly acceptable. The non-generic choices are
> all entirely historical and not really very interesting.
>
> Absolutely nobody sane cares about instruction scheduling for the old P4 cores.

Ok, I'll do that instead then. This also means I can drop
the patch for CONFIG_MATOM.

> In the bad old 32-bit days, we had real code generation issues with
> basic instruction set, ie the whole "some CPU's are P6-class, but
> don't actually support the CMOVxx instruction". Those days are gone.

I did come across a remaining odd problem with this, as Crusoe and
GeodeLX both identify as Family 5 but have CMOV. Trying to use
a CONFIG_M686+CONFIG_X86_GENERIC on these runs fails with a boot
error "This kernel requires a 686 CPU but only detected a 586 CPU".

As a result, the Debian 686 kernel binary gets built with
CONFIG_MGEODE_LX , which seems mildly wrong but harmful enough
to require a change in how we handle the levels.

> And yes, on x86-64, we still have the whole cmpxchg16b issue, which
> really is a slight annoyance. But the emphasis is on "slight" - we
> basically have one back for this in the SLAB code, and a couple of
> dynamic tests for one particular driver (iommu 128-bit IRTE mode).
>
> So yeah, the cmpxchg16b thing is annoying, but _realistically_ I don't
> think we care.
>
> And some day we will forget about it, notice that those (few) AMD
> early 64-bit CPU's can't possibly have been working for the last year
> or two, and we'll finally just kill that code, but in the meantime the
> cost of maintaining it is so slight that it's not worth actively going
> out to kill it.

Right, in particular my hope of turning the runtime detection into
always using compile-time configuration for cmpxchg16b is no longer
works as I noticed that risc-v has also gained a runtime detection
for system_has_cmpxchg128().

Besides cmpxchg16b, I can also see compile-time configuration
for some instructions (popcnt, tzcnt, movbe) and for 5-level
paging being useful, but not enough so to make up for the
configuration complexity.

I still think we will end up needing more compile time
configurability like this on arm64 to deal with small-memory
embedded systems, e.g. with a specialized cortex-a55 kernel
that leaves out support for other CPUs, but this is quite
different from the situation on x86-64.

> I do think that the *one* option we might have is "optimize for the
> current CPU" for people who just want to build their own kernel for
> their own machine. That's a nice easy choice to give people, and
> '-march=native' is kind of simple to use.
>
> Will that work when you cross-compile? No. Do we care? Also no. It's
> basically a simple "you want to optimize for your own local machine"
> switch.

Sure, I'll add that as a separate patch. Should it be -march=native
or -mtune=native though? Using -march= can be faster if it picks
up newer instructions, but it will eventually lead to users
running into a boot panic if it is accidentally turned on for
a kernel that runs on an older machine than it was built on.

> Maybe that could replace some of the 32-bit choices too?

Probably not. I spent hours looking through the 32-bit choices
in the hope of finding a way that is less of a mess. The current
menu mixes up instruction set level (486/586/686), optimization
(atom/k7/m3/pentiumm) and platform (elan/geode/pc) options.
This is needlessly confusing, but any change to the status quo
is going to cause more problems for existing users than it
solves. All the "interesting" embedded ones are likely to be
cross-compiled anyway, so mtune=native or -march=native wouldn't
help them either.

Arnd