Re: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket

From: Suravee Suthikulpanit
Date: Tue Jun 27 2017 - 12:54:33 EST


Boris,

On 6/27/17 20:42, Borislav Petkov wrote:
On Tue, Jun 27, 2017 at 08:07:10PM +0700, Suravee Suthikulpanit wrote:
What we are trying point out here is that (NUMA) "node" and "die" are the
same thing in most of AMD processors, not necessary trying to introduce
another term here.

So don't use it then. The whole topology topic is confusing as it is to
people so that every time I, for example, have to explain myself with an
example when talking about it. Adding a "die" into the mix makes it more
confusing, not less.

So pick the terms, please, and document them properly so that we all are
on the same page when talking about topology.


Yes. 4 packages (or 4 dies, or 4 NUMA nodes) in a socket.

See above.

I'd like to have the topology terminology all explained and written
down, pls.


Sure, I will document the additional terms as you suggested once we agree on the direction.

However, SRAT/SLIT does not describe the DIE. So, using
x86_numa_in_packge_topology on multi-die Zen processor will result in
missing the DIE sched-domain for cpus within a die.

What does "does not describe the DIE" mean exactly? How exactly you need
to describe a die. And forget the die sched domain - first answer the
question: how is the NUMA info in SRAT/SLIT insufficient for scheduling?

Are you saying, you want to have all threads on a die belong to a
separate scheduling entity?

Please see my comment below.....

Zen cpu0 (package-as-die)
domain0 00000000,00000001,00000000,00000001 (SMT)
domain1 00000000,0000000f,00000000,0000000f (MC ccx)
domain2 00000000,000000ff,00000000,000000ff (DIE)

So this is 8 threads IINM.


Actually, the DIE sched-domain (domain2) has 16 threads (the cpumask is split between cpu 0-7 and 64-71 since the BIOS enumerate all T0 in the system first before T1).

You want to have those 8 threads as a separate scheduling entity?
But looking at this picture:

Die (Dx) View :
----------------------------
C0 | T0 T1 | || | T0 T1 | C4
--------| || |--------
C1 | T0 T1 | L3 || L3 | T0 T1 | C5
--------| || |--------
C2 | T0 T1 | #0 || #1 | T0 T1 | C6
--------| || |--------
C3 | T0 T1 | || | T0 T1 | C7
----------------------------

That's 16 threads on a die.

So are you trying to tell me that you want to have all threads sharing
an L3 into a single scheduling domain? Is that it?
Or do you want to have all threads on a die in a single scheduling
domain?

The 8 threads sharing each L3 are already in the same sched-domain1 (MC CCX). So, cpu0 is in the same sched-domain1 as cpu1,2,3,64,65,66,67. Here, we need the DIE sched-domain because it represents all cpus that are in the same NUMA node (since we have one memory controller per DIE). IIUC, for Zen, w/o the DIE sched-domain, the scheduler could try to re-balance the tasks from one CCX (schedule group) to another CCX across NUMA node, and potentially causing unnecessary performance due to remote memory access.

Please note also that SRAT/SLIT information are used to derive the NUMA sched-domains, while the DIE sched-domain is non-NUMA sched-domain (derived from CPUID topology extension which is available on newer families).

Please let me know if I missing any other points.

Thanks,
Suravee