Re: [PATCH sched_ext/for-6.13] sched_ext: Do not enable LLC/NUMA optimizations when domains overlap

From: Andrea Righi
Date: Tue Nov 05 2024 - 20:08:18 EST


On Tue, Nov 05, 2024 at 02:33:53PM -1000, Tejun Heo wrote:
> Hello,
>
> On Wed, Nov 06, 2024 at 01:29:08AM +0100, Andrea Righi wrote:
> ...
> > Let's say we have 2 NUMA nodes, each with 2 sockets, and each socket
> > has its own L3 cache. In this case, numa_cpus will be larger than
> > llc_cpus, and enabling both NUMA and LLC optimizations would be
> > beneficial.
> >
> > On the other hand, if each NUMA node contains only 1 socket, numa_cpus
> > and llc_cpus will overlap completely, making it unnecessary to enable
> > both NUMA and LLC optimizations, so we can have just the LLC in this
> > case.
> >
> > Would something like this help clarifying the first test?
>
> I was more thinking about the theoretical case where one socket has one LLC
> while a different socket has multiple LLCs. I don't think there are any
> systems which are actually like that but there's nothing in the code which
> prevents that (unlike a single CPU belonging to multiple domains), so it'd
> probably be worthwhile to explain why the abbreviated test is enough.

In theory a CPU can only belong to a single domain (otherwise other
stuff in topology.c are broken as well), but potentially we could have
something like:

NUMA 1
- CPU 1 (L3)
NUMA 2
- CPU 2 (L3)
- CPU 3 (L3)

If we inspect CPU 1 only we may incorrectly assume that numa_cpus ==
llc_cpus. To handle this properly we may have to inspect
all the CPUs, instead of just the first one.

Moreover, with qemu we can also simulate ugly topologies like 2 NUMA
nodes and 1 L3 cache that covers the 2 NUMA nodes:

arighi@gpd3~/s/linux (master)> vng --cpu 4 -m 4G --numa 2G,cpus=0-1 --numa 2G,cpus=2-3
...
arighi@virtme-ng~/s/linux (master)> lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
0 0 0 0 0:0:0:0 yes
1 0 0 1 1:1:1:0 yes
2 1 0 2 2:2:2:0 yes
3 1 0 3 3:3:3:0 yes
arighi@virtme-ng~/s/linux (master)> numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 2014 MB
node 0 free: 1931 MB
node 1 cpus: 2 3
node 1 size: 1896 MB
node 1 free: 1847 MB
node distances:
node 0 1
0: 10 20
1: 20 10

I think this is only possible in a virtualized environment, in this case
LLC should be disabled and NUMA enabled. Maybe it's worth checking also
for the case where LLC > NUMA...

-Andrea