Re: [PATCH 2/2] sched/topology: Expose numa_mask set/clear functions to arch

From: Peter Zijlstra
Date: Wed Aug 29 2018 - 04:02:41 EST


On Fri, Aug 10, 2018 at 10:30:19PM +0530, Srikar Dronamraju wrote:
> With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node
> sched domain") scheduler introduces an new numa level. However on shared
> lpars like powerpc, this extra sched domain creation can lead to
> repeated rcu stalls, sometimes even causing unresponsive systems on
> boot. On such stalls, it was noticed that init_sched_groups_capacity()
> (sg != sd->groups is always true).
>
> INFO: rcu_sched self-detected stall on CPU
> 1-....: (240039 ticks this GP) idle=c32/1/4611686018427387906 softirq=782/782 fqs=80012
> (t=240039 jiffies g=6272 c=6271 q=263040)
> NMI backtrace for cpu 1

> --- interrupt: 901 at __bitmap_weight+0x70/0x100
> LR = __bitmap_weight+0x78/0x100
> [c00000832132f9b0] [c0000000009bb738] __func__.61127+0x0/0x20 (unreliable)
> [c00000832132fa00] [c00000000016c178] build_sched_domains+0xf98/0x13f0
> [c00000832132fb30] [c00000000016d73c] partition_sched_domains+0x26c/0x440
> [c00000832132fc20] [c0000000001ee284] rebuild_sched_domains_locked+0x64/0x80
> [c00000832132fc50] [c0000000001f11ec] rebuild_sched_domains+0x3c/0x60
> [c00000832132fc80] [c00000000007e1c4] topology_work_fn+0x24/0x40
> [c00000832132fca0] [c000000000126704] process_one_work+0x1a4/0x470
> [c00000832132fd30] [c000000000126a68] worker_thread+0x98/0x540
> [c00000832132fdc0] [c00000000012f078] kthread+0x168/0x1b0
> [c00000832132fe30] [c00000000000b65c]
> ret_from_kernel_thread+0x5c/0x80
>
> Similar problem was earlier also reported at
> https://lwn.net/ml/linux-kernel/20180512100233.GB3738@osiris/
>
> Allow arch to set and clear masks corresponding to numa sched domain.

What this Changelog fails to do is explain the problem and motivate why
this is the right solution.

As-is, this reads like, something's buggered, I changed this random thing
and it now works.

So what is causing that domain construction error?