Re: NMI watchdog triggering during load_balance
From: Mike Galbraith
Date: Thu Mar 05 2015 - 23:53:18 EST
On Thu, 2015-03-05 at 21:05 -0700, David Ahern wrote:
> Hi Peter/Mike/Ingo:
>
> I've been banging my against this wall for a week now and hoping you or
> someone could shed some light on the problem.
>
> On larger systems (256 to 1024 cpus) there are several use cases (e.g.,
> http://www.cs.virginia.edu/stream/) that regularly trigger the NMI
> watchdog with the stack trace:
>
> Call Trace:
> @ [000000000045d3d0] double_rq_lock+0x4c/0x68
> @ [00000000004699c4] load_balance+0x278/0x740
> @ [00000000008a7b88] __schedule+0x378/0x8e4
> @ [00000000008a852c] schedule+0x68/0x78
> @ [000000000042c82c] cpu_idle+0x14c/0x18c
> @ [00000000008a3a50] after_lock_tlb+0x1b4/0x1cc
>
> Capturing data for all CPUs I tend to see load_balance related stack
> traces on 700-800 cpus, with a few hundred blocked on _raw_spin_trylock_bh.
>
> I originally thought it was a deadlock in the rq locking, but if I bump
> the watchdog timeout the system eventually recovers (after 10-30+
> seconds of unresponsiveness) so it does not seem likely to be a deadlock.
>
> This particluar system has 1024 cpus:
> # lscpu
> Architecture: sparc64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Big Endian
> CPU(s): 1024
> On-line CPU(s) list: 0-1023
> Thread(s) per core: 8
> Core(s) per socket: 4
> Socket(s): 32
> NUMA node(s): 4
> NUMA node0 CPU(s): 0-255
> NUMA node1 CPU(s): 256-511
> NUMA node2 CPU(s): 512-767
> NUMA node3 CPU(s): 768-1023
>
> and there are 4 scheduling domains. An example of the domain debug
> output (condensed for the email):
>
> CPU970 attaching sched-domain:
> domain 0: span 968-975 level SIBLING
> groups: 8 single CPU groups
> domain 1: span 968-975 level MC
> groups: 1 group with 8 cpus
> domain 2: span 768-1023 level CPU
> groups: 4 groups with 256 cpus per group
Wow, that topology is horrid. I'm not surprised that your box is
writhing in agony. Can you twiddle that?
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/