Re: [tip: sched/urgent] sched/core: Avoid spurious lock dependencies

From: Qian Cai
Date: Fri Nov 22 2019 - 16:03:30 EST


On Fri, 2019-11-22 at 21:20 +0100, Peter Zijlstra wrote:
> On Fri, Nov 22, 2019 at 09:01:22PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2019-11-13 10:06:28 [-0000], tip-bot2 for Peter Zijlstra wrote:
> > > sched/core: Avoid spurious lock dependencies
> > >
> > > While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> > > when DEBUG_OBJETS, can end up doing allocations.
> > >
> > > This then results in the following lock order:
> > >
> > > rq->lock
> > > zone->lock.rlock
> > > batched_entropy_u64.lock
> > >
> > > Which in turn causes deadlocks when we do wakeups while holding that
> > > batched_entropy lock -- as the random code does.
> >
> > Peter, can it _really_ cause deadlocks? My understanding was that the
> > batched_entropy_u64.lock is a per-CPU lock and can _not_ cause a
> > deadlock because it can be always acquired on multiple CPUs
> > simultaneously (and it is never acquired cross-CPU).
> > Lockdep is simply not smart enough to see that and complains about it
> > like it would complain about a regular lock in this case.
>
> That part yes. That is, even holding a per-cpu lock you can do a wakeup
> to the local cpu and recurse back onto rq->lock.
>
> However I don't think it can actually happen bceause this
> is init_idle, and we only ever do that on hotplug, so actually creating
> the concurrency required for the deadlock might be tricky.
>
> Still, moving that thing out from under the lock was simple and correct.

Well, the patch alone fixed a real deadlock during boot.

https://lore.kernel.org/lkml/1566509603.5576.10.camel@xxxxxx/

It needs DEBUG_OBJECTS=y to trigger though.

Suppose it does,

CPU0: zone_lock -> prink() [1]
CPUs: printk() -> zone_lock [2]

[1]
[ 1078.599835][T43784] -> #1 (console_owner){-...}:
[ 1078.606618][T43784]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5c8/0xbb0
[ 1078.611661][T43784]ÂÂÂÂÂÂÂÂlock_acquire+0x154/0x428
[ 1078.616530][T43784]ÂÂÂÂÂÂÂÂconsole_unlock+0x298/0x898
[ 1078.621573][T43784]ÂÂÂÂÂÂÂÂvprintk_emit+0x2d4/0x460
[ 1078.626442][T43784]ÂÂÂÂÂÂÂÂvprintk_default+0x48/0x58
[ 1078.631398][T43784]ÂÂÂÂÂÂÂÂvprintk_func+0x194/0x250
[ 1078.636267][T43784]ÂÂÂÂÂÂÂÂprintk+0xbc/0xec
[ 1078.640443][T43784]ÂÂÂÂÂÂÂÂ_warn_unseeded_randomness+0xb4/0xd0
[ 1078.646267][T43784]ÂÂÂÂÂÂÂÂget_random_u64+0x4c/0x100
[ 1078.651224][T43784]ÂÂÂÂÂÂÂÂadd_to_free_area_random+0x168/0x1a0
[ 1078.657047][T43784]ÂÂÂÂÂÂÂÂfree_one_page+0x3dc/0xd08


[2]
[ÂÂ317.337609] -> #3 (&(&zone->lock)->rlock){-.-.}:
[ÂÂ317.337612]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b3/0xb40
[ÂÂ317.337613]ÂÂÂÂÂÂÂÂlock_acquire+0x126/0x280
[ÂÂ317.337613]ÂÂÂÂÂÂÂÂ_raw_spin_lock+0x2f/0x40
[ÂÂ317.337614]ÂÂÂÂÂÂÂÂrmqueue_bulk.constprop.21+0xb6/0x1160
[ÂÂ317.337615]ÂÂÂÂÂÂÂÂget_page_from_freelist+0x898/0x22c0
[ÂÂ317.337616]ÂÂÂÂÂÂÂÂ__alloc_pages_nodemask+0x2f3/0x1cd0
[ÂÂ317.337617]ÂÂÂÂÂÂÂÂalloc_page_interleave+0x18/0x130
[ÂÂ317.337618]ÂÂÂÂÂÂÂÂalloc_pages_current+0xf6/0x110
[ÂÂ317.337619]ÂÂÂÂÂÂÂÂallocate_slab+0x4c6/0x19c0
[ÂÂ317.337620]ÂÂÂÂÂÂÂÂnew_slab+0x46/0x70
[ÂÂ317.337621]ÂÂÂÂÂÂÂÂ___slab_alloc+0x58b/0x960
[ÂÂ317.337621]ÂÂÂÂÂÂÂÂ__slab_alloc+0x43/0x70
[ÂÂ317.337622]ÂÂÂÂÂÂÂÂkmem_cache_alloc+0x354/0x460
[ÂÂ317.337623]ÂÂÂÂÂÂÂÂfill_pool+0x272/0x4b0
[ÂÂ317.337624]ÂÂÂÂÂÂÂÂ__debug_object_init+0x86/0x790
[ÂÂ317.337624]ÂÂÂÂÂÂÂÂdebug_object_init+0x16/0x20
[ÂÂ317.337625]ÂÂÂÂÂÂÂÂhrtimer_init+0x27/0x1e0
[ÂÂ317.337626]ÂÂÂÂÂÂÂÂinit_dl_task_timer+0x20/0x40
[ÂÂ317.337627]ÂÂÂÂÂÂÂÂ__sched_fork+0x10b/0x1f0
[ÂÂ317.337627]ÂÂÂÂÂÂÂÂinit_idle+0xac/0x520
[ÂÂ317.337628]ÂÂÂÂÂÂÂÂidle_thread_get+0x7c/0xc0
[ÂÂ317.337629]ÂÂÂÂÂÂÂÂbringup_cpu+0x1a/0x1e0
[ÂÂ317.337630]ÂÂÂÂÂÂÂÂcpuhp_invoke_callback+0x197/0x1120
[ÂÂ317.337630]ÂÂÂÂÂÂÂÂ_cpu_up+0x171/0x280
[ÂÂ317.337631]ÂÂÂÂÂÂÂÂdo_cpu_up+0xb1/0x120
[ÂÂ317.337632]ÂÂÂÂÂÂÂÂcpu_up+0x13/0x20

[ÂÂ317.337635] -> #2 (&rq->lock){-.-.}:
[ÂÂ317.337638]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b3/0xb40
[ÂÂ317.337639]ÂÂÂÂÂÂÂÂlock_acquire+0x126/0x280
[ÂÂ317.337639]ÂÂÂÂÂÂÂÂ_raw_spin_lock+0x2f/0x40
[ÂÂ317.337640]ÂÂÂÂÂÂÂÂtask_fork_fair+0x43/0x200
[ÂÂ317.337641]ÂÂÂÂÂÂÂÂsched_fork+0x29b/0x420
[ÂÂ317.337642]ÂÂÂÂÂÂÂÂcopy_process+0xf3c/0x2fd0
[ÂÂ317.337642]ÂÂÂÂÂÂÂÂ_do_fork+0xef/0x950
[ÂÂ317.337643]ÂÂÂÂÂÂÂÂkernel_thread+0xa8/0xe0

[ÂÂ317.337649] -> #1 (&p->pi_lock){-.-.}:
[ÂÂ317.337651]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b3/0xb40
[ÂÂ317.337652]ÂÂÂÂÂÂÂÂlock_acquire+0x126/0x280
[ÂÂ317.337653]ÂÂÂÂÂÂÂÂ_raw_spin_lock_irqsave+0x3a/0x50
[ÂÂ317.337653]ÂÂÂÂÂÂÂÂtry_to_wake_up+0xb4/0x1030
[ÂÂ317.337654]ÂÂÂÂÂÂÂÂwake_up_process+0x15/0x20
[ÂÂ317.337655]ÂÂÂÂÂÂÂÂ__up+0xaa/0xc0
[ÂÂ317.337655]ÂÂÂÂÂÂÂÂup+0x55/0x60
[ÂÂ317.337656]ÂÂÂÂÂÂÂÂ__up_console_sem+0x37/0x60
[ÂÂ317.337657]ÂÂÂÂÂÂÂÂconsole_unlock+0x3a0/0x750
[ÂÂ317.337658]ÂÂÂÂÂÂÂÂvprintk_emit+0x10d/0x340
[ÂÂ317.337658]ÂÂÂÂÂÂÂÂvprintk_default+0x1f/0x30
[ÂÂ317.337659]ÂÂÂÂÂÂÂÂvprintk_func+0x44/0xd4
[ÂÂ317.337660]ÂÂÂÂÂÂÂÂprintk+0x9f/0xc5