Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()

From: Qian Cai
Date: Wed Sep 25 2019 - 11:18:54 EST


On Wed, 2019-09-25 at 11:31 +0200, Peter Zijlstra wrote:
> On Fri, Sep 13, 2019 at 12:27:44PM -0400, Qian Cai wrote:
> > The commit b7d5dc21072c ("random: add a spinlock_t to struct
> > batched_entropy") insists on acquiring "batched_entropy_u32.lock" in
> > get_random_u32() which introduced the lock chain,
> >
> > "&rq->lock --> batched_entropy_u32.lock"
> >
> > even after crng init. As the result, it could result in deadlock below.
> > Fix it by using get_random_bytes() in shuffle_freelist() which does not
> > need to take on the batched_entropy locks.
> >
> > WARNING: possible circular locking dependency detected
> > 5.3.0-rc7-mm1+ #3 Tainted: G L
> > ------------------------------------------------------
> > make/7937 is trying to acquire lock:
> > ffff900012f225f8 (random_write_wait.lock){....}, at:
> > __wake_up_common_lock+0xa8/0x11c
> >
> > but task is already holding lock:
> > ffff0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
> > get_random_u32+0x6c/0x1dc
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #3 (batched_entropy_u32.lock){-.-.}:
> > lock_acquire+0x31c/0x360
> > _raw_spin_lock_irqsave+0x7c/0x9c
> > get_random_u32+0x6c/0x1dc
> > new_slab+0x234/0x6c0
> > ___slab_alloc+0x3c8/0x650
> > kmem_cache_alloc+0x4b0/0x590
> > __debug_object_init+0x778/0x8b4
> > debug_object_init+0x40/0x50
> > debug_init+0x30/0x29c
> > hrtimer_init+0x30/0x50
> > init_dl_task_timer+0x24/0x44
> > __sched_fork+0xc0/0x168
> > init_idle+0x78/0x26c
> > fork_idle+0x12c/0x178
> > idle_threads_init+0x108/0x178
> > smp_init+0x20/0x1bc
> > kernel_init_freeable+0x198/0x26c
> > kernel_init+0x18/0x334
> > ret_from_fork+0x10/0x18
> >
> > -> #2 (&rq->lock){-.-.}:
>
> This relation is silly..
>
> I suspect the below 'works'...

Unfortunately, the relation is still there,

copy_process()->rt_mutex_init_task()->"&p->pi_lock"

[24438.676716][ÂÂÂÂT2] -> #2 (&rq->lock){-.-.}:
[24438.676727][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b4/0xbf0
[24438.676736][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂlock_acquire+0x130/0x360
[24438.676754][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_raw_spin_lock+0x54/0x80
[24438.676771][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂtask_fork_fair+0x60/0x190
[24438.676788][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂsched_fork+0x128/0x270
[24438.676806][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcopy_process+0x7a4/0x1bf0
[24438.676823][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_do_fork+0xac/0xac0
[24438.676841][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkernel_thread+0x70/0xa0
[24438.676858][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂrest_init+0x4c/0x42c
[24438.676884][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂstart_kernel+0x778/0x7c0
[24438.676902][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂstart_here_common+0x1c/0x334

Whole thing,

[24438.675704][ÂÂÂÂT2] WARNING: possible circular locking dependency detected
[24438.675714][ÂÂÂÂT2] 5.3.0-next-20190924 #2 Not tainted
[24438.675722][ÂÂÂÂT2] ------------------------------------------------------
[24438.675731][ÂÂÂÂT2] kthreadd/2 is trying to acquire lock:
[24438.675740][ÂÂÂÂT2] c0000000010a7450 (random_write_wait.lock){..-.}, at:
__wake_up_common_lock+0x88/0x110
[24438.675768][ÂÂÂÂT2]Â
[24438.675768][ÂÂÂÂT2] but task is already holding lock:
[24438.675778][ÂÂÂÂT2] c000001ffd2f06e0 (batched_entropy_u64.lock){-...}, at:
get_random_u64+0x60/0x100
[24438.675803][ÂÂÂÂT2]Â
[24438.675803][ÂÂÂÂT2] which lock already depends on the new lock.
[24438.675803][ÂÂÂÂT2]Â
[24438.675816][ÂÂÂÂT2]Â
[24438.675816][ÂÂÂÂT2] the existing dependency chain (in reverse order) is:
[24438.675836][ÂÂÂÂT2]Â
[24438.675836][ÂÂÂÂT2] -> #4 (batched_entropy_u64.lock){-...}:
[24438.675860][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b4/0xbf0
[24438.675878][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂlock_acquire+0x130/0x360
[24438.675906][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_raw_spin_lock_irqsave+0x70/0xa0
[24438.675923][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂget_random_u64+0x60/0x100
[24438.675944][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂadd_to_free_area_random+0x164/0x1b0
[24438.675962][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂfree_one_page+0xb24/0xcf0
[24438.675980][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__free_pages_ok+0x448/0xbf0
[24438.675999][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂdeferred_init_maxorder+0x404/0x4a4
[24438.676018][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂdeferred_grow_zone+0x158/0x1f0
[24438.676035][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂget_page_from_freelist+0x1dc8/0x1e10
[24438.676063][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__alloc_pages_nodemask+0x1d8/0x1940
[24438.676083][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂallocate_slab+0x130/0x2740
[24438.676091][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂnew_slab+0xa8/0xe0
[24438.676101][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkmem_cache_open+0x254/0x660
[24438.676119][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__kmem_cache_create+0x44/0x2a0
[24438.676136][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcreate_boot_cache+0xcc/0x110
[24438.676154][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkmem_cache_init+0x90/0x1f0
[24438.676173][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂstart_kernel+0x3b8/0x7c0
[24438.676191][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂstart_here_common+0x1c/0x334
[24438.676208][ÂÂÂÂT2]Â
[24438.676208][ÂÂÂÂT2] -> #3 (&(&zone->lock)->rlock){-.-.}:
[24438.676221][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b4/0xbf0
[24438.676247][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂlock_acquire+0x130/0x360
[24438.676264][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_raw_spin_lock+0x54/0x80
[24438.676282][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂrmqueue_bulk.constprop.23+0x64/0xf20
[24438.676300][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂget_page_from_freelist+0x718/0x1e10
[24438.676319][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__alloc_pages_nodemask+0x1d8/0x1940
[24438.676339][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂalloc_page_interleave+0x34/0x170
[24438.676356][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂallocate_slab+0xd1c/0x2740
[24438.676374][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂnew_slab+0xa8/0xe0
[24438.676391][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ___slab_alloc+0x580/0xef0
[24438.676408][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__slab_alloc+0x64/0xd0
[24438.676426][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkmem_cache_alloc+0x5c4/0x6c0
[24438.676444][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂfill_pool+0x280/0x540
[24438.676461][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__debug_object_init+0x60/0x6b0
[24438.676479][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂhrtimer_init+0x5c/0x310
[24438.676497][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂinit_dl_task_timer+0x34/0x60
[24438.676516][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__sched_fork+0x8c/0x110
[24438.676535][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂinit_idle+0xb4/0x3c0
[24438.676553][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂidle_thread_get+0x78/0x120
[24438.676572][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂbringup_cpu+0x30/0x230
[24438.676590][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcpuhp_invoke_callback+0x190/0x1580
[24438.676618][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂdo_cpu_up+0x248/0x460
[24438.676636][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂsmp_init+0x118/0x1c0
[24438.676662][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkernel_init_freeable+0x3f8/0x8dc
[24438.676681][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkernel_init+0x2c/0x154
[24438.676699][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂret_from_kernel_thread+0x5c/0x74
[24438.676716][ÂÂÂÂT2]Â
[24438.676716][ÂÂÂÂT2] -> #2 (&rq->lock){-.-.}:
[24438.676727][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b4/0xbf0
[24438.676736][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂlock_acquire+0x130/0x360
[24438.676754][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_raw_spin_lock+0x54/0x80
[24438.676771][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂtask_fork_fair+0x60/0x190
[24438.676788][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂsched_fork+0x128/0x270
[24438.676806][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcopy_process+0x7a4/0x1bf0
[24438.676823][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_do_fork+0xac/0xac0
[24438.676841][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkernel_thread+0x70/0xa0
[24438.676858][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂrest_init+0x4c/0x42c
[24438.676884][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂstart_kernel+0x778/0x7c0
[24438.676902][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂstart_here_common+0x1c/0x334
[24438.676910][ÂÂÂÂT2]Â
[24438.676910][ÂÂÂÂT2] -> #1 (&p->pi_lock){-.-.}:
[24438.676921][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b4/0xbf0
[24438.676929][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂlock_acquire+0x130/0x360
[24438.676947][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_raw_spin_lock_irqsave+0x70/0xa0
[24438.676973][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂtry_to_wake_up+0x70/0x1600
[24438.676991][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂpollwake+0x88/0xc0
[24438.677009][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__wake_up_common+0xec/0x280
[24438.677026][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__wake_up_common_lock+0xac/0x110
[24438.677044][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂaccount.constprop.8+0x284/0x430
[24438.677061][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂextract_entropy.constprop.7+0xd4/0x330
[24438.677080][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_xfer_secondary_pool+0x104/0x3e0
[24438.677097][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂpush_to_pool+0x58/0x310
[24438.677116][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂprocess_one_work+0x300/0x8e0
[24438.677133][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂworker_thread+0x78/0x530
[24438.677151][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkthread+0x1a8/0x1b0
[24438.677180][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂret_from_kernel_thread+0x5c/0x74
[24438.677245][ÂÂÂÂT2]Â
[24438.677245][ÂÂÂÂT2] -> #0 (random_write_wait.lock){..-.}:
[24438.677329][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcheck_prev_add+0x100/0x11b0
[24438.677377][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂvalidate_chain+0x868/0x1530
[24438.677446][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b4/0xbf0
[24438.677516][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂlock_acquire+0x130/0x360
[24438.677563][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_raw_spin_lock_irqsave+0x70/0xa0
[24438.677618][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ__wake_up_common_lock+0x88/0x110
[24438.677678][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂaccount.constprop.8+0x284/0x430
[24438.677743][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂextract_entropy.constprop.7+0xd4/0x330
[24438.677802][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcrng_reseed+0x68/0x490
[24438.677867][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_extract_crng+0x104/0x110
[24438.677914][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcrng_reseed+0x284/0x490
[24438.677983][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_extract_crng+0x104/0x110
[24438.678032][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂget_random_u64+0xdc/0x100
[24438.678101][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂcopy_process+0x2d8/0x1bf0
[24438.678148][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ_do_fork+0xac/0xac0
[24438.678208][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkernel_thread+0x70/0xa0
[24438.678246][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂkthreadd+0x270/0x330
[24438.678301][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂret_from_kernel_thread+0x5c/0x74
[24438.678342][ÂÂÂÂT2]Â
[24438.678342][ÂÂÂÂT2] other info that might help us debug this:
[24438.678342][ÂÂÂÂT2]Â
[24438.678459][ÂÂÂÂT2] Chain exists of:
[24438.678459][ÂÂÂÂT2]ÂÂÂrandom_write_wait.lock --> &(&zone->lock)->rlock -->
batched_entropy_u64.lock
[24438.678459][ÂÂÂÂT2]Â
[24438.678636][ÂÂÂÂT2]ÂÂPossible unsafe locking scenario:
[24438.678636][ÂÂÂÂT2]Â
[24438.678692][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂCPU0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂCPU1
[24438.678754][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂ----ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ----
[24438.678814][ÂÂÂÂT2]ÂÂÂlock(batched_entropy_u64.lock);
[24438.678878][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂlock(&(&zone->lock)-
>rlock);
[24438.678951][ÂÂÂÂT2]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂlock(batched_entropy_u64.l
ock);
[24438.679038][ÂÂÂÂT2]ÂÂÂlock(random_write_wait.lock);
[24438.679098][ÂÂÂÂT2]Â
[24438.679098][ÂÂÂÂT2]ÂÂ*** DEADLOCK ***
[24438.679098][ÂÂÂÂT2]Â
[24438.679174][ÂÂÂÂT2] 1 lock held by kthreadd/2:
[24438.679230][ÂÂÂÂT2]ÂÂ#0: c000001ffd2f06e0 (batched_entropy_u64.lock){-...},
at: get_random_u64+0x60/0x100
[24438.679341][ÂÂÂÂT2]Â
[24438.679341][ÂÂÂÂT2] stack backtrace:
[24438.679413][ÂÂÂÂT2] CPU: 13 PID: 2 Comm: kthreadd Not tainted 5.3.0-next-
20190924 #2
[24438.679485][ÂÂÂÂT2] Call Trace:
[24438.679507][ÂÂÂÂT2] [c00000002c84efe0] [c00000000091a574]
dump_stack+0xe8/0x164 (unreliable)
[24438.679618][ÂÂÂÂT2] [c00000002c84f030] [c0000000001cc9b8]
print_circular_bug+0x3a8/0x420
[24438.679701][ÂÂÂÂT2] [c00000002c84f0e0] [c0000000001ccc90]
check_noncircular+0x260/0x320
[24438.679769][ÂÂÂÂT2] [c00000002c84f1e0] [c0000000001ce7e0]
check_prev_add+0x100/0x11b0
[24438.679868][ÂÂÂÂT2] [c00000002c84f2c0] [c0000000001d00f8]
validate_chain+0x868/0x1530
[24438.679950][ÂÂÂÂT2] [c00000002c84f3f0] [c0000000001d3064]
__lock_acquire+0x5b4/0xbf0
[24438.680059][ÂÂÂÂT2] [c00000002c84f4f0] [c0000000001d3ed0]
lock_acquire+0x130/0x360
[24438.680122][ÂÂÂÂT2] [c00000002c84f5d0] [c000000000947d70]
_raw_spin_lock_irqsave+0x70/0xa0
[24438.680207][ÂÂÂÂT2] [c00000002c84f610] [c0000000001a9488]
__wake_up_common_lock+0x88/0x110
[24438.680298][ÂÂÂÂT2] [c00000002c84f690] [c0000000006f11a4]
account.constprop.8+0x284/0x430
[24438.680399][ÂÂÂÂT2] [c00000002c84f750] [c0000000006f1554]
extract_entropy.constprop.7+0xd4/0x330
[24438.680495][ÂÂÂÂT2] [c00000002c84f7d0] [c0000000006f1818]
crng_reseed+0x68/0x490
[24438.680590][ÂÂÂÂT2] [c00000002c84f910] [c0000000006f4094]
_extract_crng+0x104/0x110
[24438.680662][ÂÂÂÂT2] [c00000002c84f950] [c0000000006f1a34]
crng_reseed+0x284/0x490
[24438.680751][ÂÂÂÂT2] [c00000002c84fa90] [c0000000006f4094]
_extract_crng+0x104/0x110
[24438.680828][ÂÂÂÂT2] [c00000002c84fad0] [c0000000006f4c0c]
get_random_u64+0xdc/0x100
[24438.680931][ÂÂÂÂT2] [c00000002c84fb10] [c000000000106988]
copy_process+0x2d8/0x1bf0
[24438.681007][ÂÂÂÂT2] [c00000002c84fc30] [c00000000010861c] _do_fork+0xac/0xac0
[24438.681074][ÂÂÂÂT2] [c00000002c84fd10] [c0000000001090d0]
kernel_thread+0x70/0xa0
[24438.681170][ÂÂÂÂT2] [c00000002c84fd80] [c0000000001518f0]
kthreadd+0x270/0x330
[24438.681257][ÂÂÂÂT2] [c00000002c84fe20] [c00000000000b748]
ret_from_kernel_thread+0x5c/0x74

>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 63900ca029e0..ec1d72f18b34 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6027,10 +6027,11 @@ void init_idle(struct task_struct *idle, int cpu)
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
>
> + __sched_fork(0, idle);
> +
> raw_spin_lock_irqsave(&idle->pi_lock, flags);
> raw_spin_lock(&rq->lock);
>
> - __sched_fork(0, idle);
> idle->state = TASK_RUNNING;
> idle->se.exec_start = sched_clock();
> idle->flags |= PF_IDLE;