Re: [PATCH v6 00/90] x86: Introduce a centralized CPUID data model

From: Christian Ludloff

Date: Tue May 05 2026 - 15:12:13 EST

On Tue, May 5, 2026 at 8:13 AM Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Tue, May 05, 2026 at 03:33:50PM +0200, Borislav Petkov wrote:
> > On Mon, Apr 27, 2026 at 08:45:23PM +0200, Ahmed S. Darwish wrote:
> > > * Keep the current synthetic CPUID(0x4c780001) and CPUID(0x4c780002)
> > > bitfield listings, with their hardware-backed scattered bits, as is.
> > > Mark them as "v1" instead of setting them in stone.
> >
> > Except that they're already cast in stone:
> >
> > https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/tip/db/xml/leaf_4c780001.xml
> >
> > and that 02 one.
> >
> > I don't want any of that to be in any database - this is Linux-internal only
> > and needs to go from there and nothing should depend on it.
> >
> > So, instead of taking this and then converting stuff ontop, the proper thing
> > to do would be:
> >
> > - rip out the linux-specific leafs from the cpuid db
> > - keep both facilities in the kernel for querying and convert the hardware
> > leafs to the new method but use the old method for querying the synthetic
> > leafs. I'm sure we can multiplex between the two in cpu_feature_enabled()
> >
> > - then start converting the synthetic ones to hidden, Linux-specific ones
> >
> > Ok?
>
> Ok, talked it over with tglx - he brought up the argument that actually
> having those flags documented is good for other tools like crash, etc, where
> you want to consult a single db for *all* X86_FEATURE flags in the kernel - no
> matter how they're defined.
>
> So, what we will do is, we'll leave those leafs as is and not touch them.
>
> If we need to add new, solely synthetic bits, we'll add them where there's
> room, document them in the db and that's it. Synthetic bits will be add-only
> and the cpuid-db will collect them.
>
> This way you have a single source for all CPUID info.
>
> The scattered.c thing goes away because we have full CPUID leaf
> representation now.
>
> We only get purely synthetic new additions to the db and we can use the Lx
> namespace for that. I guess that's plenty of room for the foreseeable future.
>
> Makes sense?

When Ahmed published, I added them like this:

https://www.sandpile.org/x86/cpuid.htm#leaf_4C78_0001h
https://www.sandpile.org/x86/cpuid.htm#leaf_4C78_0002h

That is, list which Linux word goes where, but not the bits:
for those, simply have a pointer to code (the "golden ref").

I did suggest two small improvements to Ahmed in private.

First, 4C78_0000 EAX should report the max 4C78 leaf, in
case there is future expansion. (CPUID does always grow.)

Second, the number of words reported in 0001h and 0002h
should be enumerated, in case the list grows, and so that a
program can tell between all-zero and not-present. (ECX>0
sub leaves, basically.)

Since 0001h has already used up all four registers, it can't
report a max sub leaf in EAX – so maybe e.g. 0000h EBX
has to do the job instead: it could do e.g. two byte fields, to
report the number of words instead, i.e. 0000h BL => 0001h
words, and 0001h BH => 0002h words. (Or wider fields... if
you expect more future growth for words.)

That said, it would be good to get the initial set of commits
done so that db updates can start to flow in, to catch up...

--
C.