Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

From: Len Brown
Date: Thu Feb 28 2019 - 10:59:35 EST


On Tue, Feb 26, 2019 at 8:54 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> > > It would've been nice to have the CPUID instruction 1F leaf reference
> > > 3B-3.9 in the SDM, and maybe mention this here too.
> >
> > I didn't mention SDM sections because they change -- leaving stale
> > pointers in our commit messages. The SDM is re-published 4 times per
> > year.
>
> Yah, I know. Which is why I keep all SDMs. So if you say, book 3 section
> 8 of Jul'17, I can find it :-)

The SDM is like software -- usually (but not always) you are better
off with the latest version:-)

> > Cache enumeration in Leaf-4 is totally unchanged.
> > ACPI NUMA tables are totally unchanged.
>
> Sure; and yet Sub-NUMA-Clustering broke stuff in interesting ways. I'm
> trying to get a feel for how these levels will interact with all that.
>
> Before that SNC stuff, caches had never spanned NODEs (and I still
> think that is 'creative' at best).

Yeah, SNC is sort of a curve ball. I guess it made enough stuff run better that
it is available as an option. But it doesn't help everything, so it is disabled
by default...

I think from a scheduler point of view, sticking with the output of
CPUID.4 for the cache topology, and the ACPI tables for the
node topology/distances, is the right strategy.

> > From a scheduler point of view, imagine that a SKX system with 4 die
> > in 4 packages was mechanically re-designed so that those 4 die resided
> > in 2 double-sized packages.
> >
> > They may have tweaked the links between the die, but logically it is
> > identical and compatible, and the legacy kernel will function
> > properly.
>
> This example has LLC in die and yes that works.
>
> But I can imagine things like L2 in tile and L3 across tiles but within
> DIE and then it _might_ make sense to still consider the tile for
> scheduling.
>
> Another option is having the LLC off die; also not unheard of.
>
> And then there's many creative and slightly crazy ways this can all be
> combined :/

If any of those crazy things happen, CPUID.B/CPUID.1F are not
going to help software understand it -- CPUID.4 and the NUMA tables
are the tool of choice.

> > So the effect of Leaf B,1F is that it defines the scope of MSRs. eg.
> > what processors does a die-scope MSR cover. That is why the rest of
> > the patch is about sysfs topology, and about package MSR scope.
> >
> > Yes, there will be more exotic MSR situations in future products --
> > the first ones are pretty simple -- something called a
> > package-scope-MSR in the SDM today becomes a die-scope-MSR in this
> > generation on a multi-die/package system.
>
> Yes :-(
>
> > It also reflects how many packages appear in sysfs, and this can
> > effect licensing of some kinds of software.
>
> That's just plain insanity and we should not let that affect our sysfs
> interfaces.

This change isn't made for compatibility with per-package licensing.
Indeed, vendors, who license based on package-count need to
be made aware that on a system with multi-die/package, they'll
see their package count go _down_ as a result of this change.
Thankfully, I'm told that per-package licensing is quite rare --
most stuff that cares has moved to per-CPU.

I think a good semantic side effect of this series is that it
maintains the invariant that a physical package and a socket are synonymous.
While we don't use the word "socket" in Linux anymore, the industry
broadly assume that the two are synonyms. And people expect that a
physical package really is a physical package -- you can see it,
buy it in a box, and hold it in your hand.

Functionally, the bottom line is that it allows software to discover topology
levels that previously needed to be discovered by looking up family/model,
in the past, which was sort of annoying. The things that care are
things that care about MSR scope. Thankfully, the list of things that care
about MSR scope is quite finite.

thanks,
-Len




--
Len Brown, Intel Open Source Technology Center