Re: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket

From: Borislav Petkov
Date: Tue Jun 27 2017 - 09:43:33 EST


On Tue, Jun 27, 2017 at 08:07:10PM +0700, Suravee Suthikulpanit wrote:
> What we are trying point out here is that (NUMA) "node" and "die" are the
> same thing in most of AMD processors, not necessary trying to introduce
> another term here.

So don't use it then. The whole topology topic is confusing as it is to
people so that every time I, for example, have to explain myself with an
example when talking about it. Adding a "die" into the mix makes it more
confusing, not less.

So pick the terms, please, and document them properly so that we all are
on the same page when talking about topology.

> Yes. 4 packages (or 4 dies, or 4 NUMA nodes) in a socket.

See above.

I'd like to have the topology terminology all explained and written
down, pls.

> As I have described in the cover letter, this patch series changes how

I know what you've described in the cover letter - I've read it. I
meant, put that same explanation in the commit message to state *why*
this patch is needed.

> SLIT table is showing
>
> node 0 1 2 3 4 5 6 7
> 0: 10 16 16 16 32 32 32 32
> 1: 16 10 16 16 32 32 32 32
> 2: 16 16 10 16 32 32 32 32
> 3: 16 16 16 10 32 32 32 32
> 4: 32 32 32 32 10 16 16 16
> 5: 32 32 32 32 16 10 16 16
> 6: 32 32 32 32 16 16 10 16
> 7: 32 32 32 32 16 16 16 10
>

> However, SRAT/SLIT does not describe the DIE. So, using
> x86_numa_in_packge_topology on multi-die Zen processor will result in
> missing the DIE sched-domain for cpus within a die.

What does "does not describe the DIE" mean exactly? How exactly you need
to describe a die. And forget the die sched domain - first answer the
question: how is the NUMA info in SRAT/SLIT insufficient for scheduling?

Are you saying, you want to have all threads on a die belong to a
separate scheduling entity?

> Again, Magny-Cours MCM DIE sched-domain is the same as MC sched-domain.
> So, we can omit the DIE sched-domain. Here is an example of /proc/schedstat

Ok, we can forget the DIE thing.

> Magney-Cours cpu0
> domain0 00000000,00000003 (SMT)
> domain1 00000000,000000ff (MC which is the same as DIE)
> domain2 00ff00ff,00ffffff (NUMA 1 hop)
> domain3 ffffffff,ffffffff (NUMA platform)
>
> Zen cpu0 (package-as-socket)
> domain0 00000000,00000001,00000000,00000001 (SMT)
> domain1 00000000,0000000f,00000000,0000000f (MC ccx)
> domain2 00000000,ffffffff,00000000,ffffffff (NUMA socket)
> domain3 ffffffff,ffffffff,ffffffff,ffffffff (NUMA platform)
>
> Zen cpu0 (package-as-die)
> domain0 00000000,00000001,00000000,00000001 (SMT)
> domain1 00000000,0000000f,00000000,0000000f (MC ccx)
> domain2 00000000,000000ff,00000000,000000ff (DIE)

So this is 8 threads IINM.

You want to have those 8 threads as a separate scheduling entity?

But looking at this picture:

Die (Dx) View :
----------------------------
C0 | T0 T1 | || | T0 T1 | C4
--------| || |--------
C1 | T0 T1 | L3 || L3 | T0 T1 | C5
--------| || |--------
C2 | T0 T1 | #0 || #1 | T0 T1 | C6
--------| || |--------
C3 | T0 T1 | || | T0 T1 | C7
----------------------------

That's 16 threads on a die.

So are you trying to tell me that you want to have all threads sharing
an L3 into a single scheduling domain? Is that it?

Or do you want to have all threads on a die in a single scheduling
domain?

Also, bear in mind that if we do this picking of
threads/cores/nodes/dies/... apart like this now, it will definitely
need touching in the future when the hw guys change topology again.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.