Re: [PATCH] sched/topology: Warn when NUMA diameter > 2
From: Mel Gorman
Date: Wed Nov 11 2020 - 03:44:05 EST
On Tue, Nov 10, 2020 at 06:43:00PM +0000, Valentin Schneider wrote:
> NUMA topologies where the shortest path between some two nodes requires
> three or more hops (i.e. diameter > 2) end up being misrepresented in the
> scheduler topology structures.
>
> This is currently detected when booting a kernel with CONFIG_SCHED_DEBUG=y
> + sched_debug on the cmdline, although this will only yield a warning about
> sched_group spans not matching sched_domain spans:
>
> ERROR: groups don't span domain->span
>
> Add an explicit warning for that case, triggered regardless of
> CONFIG_SCHED_DEBUG, and decorate it with an appropriate comment.
>
> The topology described in the comment can be booted up on QEMU by appending
> the following to your usual QEMU incantation:
>
> -smp cores=4 \
> -numa node,cpus=0,nodeid=0 -numa node,cpus=1,nodeid=1, \
> -numa node,cpus=2,nodeid=2, -numa node,cpus=3,nodeid=3, \
> -numa dist,src=0,dst=1,val=20, -numa dist,src=0,dst=2,val=30, \
> -numa dist,src=0,dst=3,val=40, -numa dist,src=1,dst=2,val=20, \
> -numa dist,src=1,dst=3,val=30, -numa dist,src=2,dst=3,val=20
>
> A somewhat more realistic topology (6-node mesh) with the same affliction
> can be conjured with:
>
> -smp cores=6 \
> -numa node,cpus=0,nodeid=0 -numa node,cpus=1,nodeid=1, \
> -numa node,cpus=2,nodeid=2, -numa node,cpus=3,nodeid=3, \
> -numa node,cpus=4,nodeid=4, -numa node,cpus=5,nodeid=5, \
> -numa dist,src=0,dst=1,val=20, -numa dist,src=0,dst=2,val=30, \
> -numa dist,src=0,dst=3,val=40, -numa dist,src=0,dst=4,val=30, \
> -numa dist,src=0,dst=5,val=20, \
> -numa dist,src=1,dst=2,val=20, -numa dist,src=1,dst=3,val=30, \
> -numa dist,src=1,dst=4,val=20, -numa dist,src=1,dst=5,val=30, \
> -numa dist,src=2,dst=3,val=20, -numa dist,src=2,dst=4,val=30, \
> -numa dist,src=2,dst=5,val=40, \
> -numa dist,src=3,dst=4,val=20, -numa dist,src=3,dst=5,val=30, \
> -numa dist,src=4,dst=5,val=20
>
> Link: https://lore.kernel.org/lkml/jhjtux5edo2.mognet@xxxxxxx
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>
Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
--
Mel Gorman
SUSE Labs