RE: [PATCH 1/1] sched/topology: Make sched_init_numa() use a set for the deduplicating sort

From: Valentin Schneider
Date: Thu Jan 28 2021 - 09:50:45 EST


On 25/01/21 21:35, Song Bao Hua (Barry Song) wrote:
> I was using 5.11-rc1. One thing I'd like to mention is that:
>
> For the below topology:
> +-------+ +-----+
> | node1 | 20 |node2|
> | +----------+ |
> +---+---+ +-----+
> | |12
> 12 | |
> +---+---+ +---+-+
> | | |node3|
> | node0 | | |
> +-------+ +-----+
>
> with node0-node2 as 22, node0-node3 as 24, node1-node3 as 22.
>
> I will get the below sched_domains_numa_distance[]:
> 10, 12, 22, 24
> As you can see there is *no* 20. So the node1 and node2 will
> only get two-level numa sched_domain:
>


So that's

-numa node,cpus=0-1,nodeid=0 -numa node,cpus=2-3,nodeid=1, \
-numa node,cpus=4-5,nodeid=2, -numa node,cpus=6-7,nodeid=3, \
-numa dist,src=0,dst=1,val=12, \
-numa dist,src=0,dst=2,val=22, \
-numa dist,src=0,dst=3,val=24, \
-numa dist,src=1,dst=2,val=20, \
-numa dist,src=1,dst=3,val=22, \
-numa dist,src=2,dst=3,val=12

but running this still doesn't get me a splat. Debugging
sched_domains_numa_distance[] still gives me
{10, 12, 20, 22, 24}

>
> But for the below topology:
> +-------+ +-----+
> | node0 | 20 |node2|
> | +----------+ |
> +---+---+ +-----+
> | |12
> 12 | |
> +---+---+ +---+-+
> | | |node3|
> | node1 | | |
> +-------+ +-----+
>
> with node1-node2 as 22, node1-node3 as 24,node0-node3 as 22.
>
> I will get the below sched_domains_numa_distance[]:
> 10, 12, 20, 22, 24
>
> What I have seen is the performance will be better if we
> drop the 20 as we will get a sched_domain hierarchy with less
> levels, and two intermediate nodes won't have the group span
> issue.
>

That is another thing that's worth considering. Morten was arguing that if
the distance between two nodes is so tiny, it might not be worth
representing it at all in the scheduler topology.

> Thanks
> Barry