CONFIG_SHUFFLE_PAGE_ALLOCATOR=y lockdep splat (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

From: Qian Cai
Date: Wed Sep 11 2019 - 22:29:34 EST


Adjusted Cc a bit as this looks like more of the scheduler territory.

> On Sep 10, 2019, at 3:49 PM, Qian Cai <cai@xxxxxx> wrote:
>
> Hmm, it feels like that CONFIG_SHUFFLE_PAGE_ALLOCATOR=y introduces some unique
> locking patterns that the lockdep does not like via,
>
> allocate_slab
> shuffle_freelist
> get_random_u32
>
> Here is another splat with while compiling/installing a kernel,
>
> [ 1254.443119][ C2] WARNING: possible circular locking dependency detected
> [ 1254.450038][ C2] 5.3.0-rc5-next-20190822 #1 Not tainted
> [ 1254.455559][ C2] ------------------------------------------------------
> [ 1254.462988][ C2] swapper/2/0 is trying to acquire lock:
> [ 1254.468509][ C2] ffffffffa2925218 (random_write_wait.lock){..-.}, at:
> __wake_up_common_lock+0xc6/0x150
> [ 1254.478154][ C2]
> [ 1254.478154][ C2] but task is already holding lock:
> [ 1254.485896][ C2] ffff88845373fda0 (batched_entropy_u32.lock){-.-.}, at:
> get_random_u32+0x4c/0xe0
> [ 1254.495007][ C2]
> [ 1254.495007][ C2] which lock already depends on the new lock.
> [ 1254.495007][ C2]
> [ 1254.505331][ C2]
> [ 1254.505331][ C2] the existing dependency chain (in reverse order) is:
> [ 1254.514755][ C2]
> [ 1254.514755][ C2] -> #3 (batched_entropy_u32.lock){-.-.}:
> [ 1254.522553][ C2] __lock_acquire+0x5b3/0xb40
> [ 1254.527638][ C2] lock_acquire+0x126/0x280
> [ 1254.533016][ C2] _raw_spin_lock_irqsave+0x3a/0x50
> [ 1254.538624][ C2] get_random_u32+0x4c/0xe0
> [ 1254.543539][ C2] allocate_slab+0x6d6/0x19c0
> [ 1254.548625][ C2] new_slab+0x46/0x70
> [ 1254.553010][ C2] ___slab_alloc+0x58b/0x960
> [ 1254.558533][ C2] __slab_alloc+0x43/0x70
> [ 1254.563269][ C2] kmem_cache_alloc+0x354/0x460
> [ 1254.568534][ C2] fill_pool+0x272/0x4b0
> [ 1254.573182][ C2] __debug_object_init+0x86/0x7a0
> [ 1254.578615][ C2] debug_object_init+0x16/0x20
> [ 1254.584256][ C2] hrtimer_init+0x27/0x1e0
> [ 1254.589079][ C2] init_dl_task_timer+0x20/0x40
> [ 1254.594342][ C2] __sched_fork+0x10b/0x1f0
> [ 1254.599253][ C2] init_idle+0xac/0x520
> [ 1254.603816][ C2] fork_idle+0x18c/0x230
> [ 1254.608933][ C2] idle_threads_init+0xf0/0x187
> [ 1254.614193][ C2] smp_init+0x1d/0x12d
> [ 1254.618671][ C2] kernel_init_freeable+0x37e/0x76e
> [ 1254.624282][ C2] kernel_init+0x11/0x12f
> [ 1254.629016][ C2] ret_from_fork+0x27/0x50
> [ 1254.634344][ C2]
> [ 1254.634344][ C2] -> #2 (&rq->lock){-.-.}:
> [ 1254.640831][ C2] __lock_acquire+0x5b3/0xb40
> [ 1254.645917][ C2] lock_acquire+0x126/0x280
> [ 1254.650827][ C2] _raw_spin_lock+0x2f/0x40
> [ 1254.655741][ C2] task_fork_fair+0x43/0x200
> [ 1254.661213][ C2] sched_fork+0x29b/0x420
> [ 1254.665949][ C2] copy_process+0xf12/0x3180
> [ 1254.670947][ C2] _do_fork+0xef/0x950
> [ 1254.675422][ C2] kernel_thread+0xa8/0xe0
> [ 1254.680244][ C2] rest_init+0x28/0x311
> [ 1254.685298][ C2] arch_call_rest_init+0xe/0x1b
> [ 1254.690558][ C2] start_kernel+0x6eb/0x724
> [ 1254.695469][ C2] x86_64_start_reservations+0x24/0x26
> [ 1254.701339][ C2] x86_64_start_kernel+0xf4/0xfb
> [ 1254.706689][ C2] secondary_startup_64+0xb6/0xc0
> [ 1254.712601][ C2]
> [ 1254.712601][ C2] -> #1 (&p->pi_lock){-.-.}:
> [ 1254.719263][ C2] __lock_acquire+0x5b3/0xb40
> [ 1254.724349][ C2] lock_acquire+0x126/0x280
> [ 1254.729260][ C2] _raw_spin_lock_irqsave+0x3a/0x50
> [ 1254.735317][ C2] try_to_wake_up+0xad/0x1050
> [ 1254.740403][ C2] default_wake_function+0x2f/0x40
> [ 1254.745929][ C2] pollwake+0x10d/0x160
> [ 1254.750491][ C2] __wake_up_common+0xc4/0x2a0
> [ 1254.755663][ C2] __wake_up_common_lock+0xea/0x150
> [ 1254.761756][ C2] __wake_up+0x13/0x20
> [ 1254.766230][ C2] account.constprop.9+0x217/0x340
> [ 1254.771754][ C2] extract_entropy.constprop.7+0xcf/0x220
> [ 1254.777886][ C2] _xfer_secondary_pool+0x19a/0x3d0
> [ 1254.783981][ C2] push_to_pool+0x3e/0x230
> [ 1254.788805][ C2] process_one_work+0x52a/0xb40
> [ 1254.794064][ C2] worker_thread+0x63/0x5b0
> [ 1254.798977][ C2] kthread+0x1df/0x200
> [ 1254.803451][ C2] ret_from_fork+0x27/0x50
> [ 1254.808787][ C2]
> [ 1254.808787][ C2] -> #0 (random_write_wait.lock){..-.}:
> [ 1254.816409][ C2] check_prev_add+0x107/0xea0
> [ 1254.821494][ C2] validate_chain+0x8fc/0x1200
> [ 1254.826667][ C2] __lock_acquire+0x5b3/0xb40
> [ 1254.831751][ C2] lock_acquire+0x126/0x280
> [ 1254.837189][ C2] _raw_spin_lock_irqsave+0x3a/0x50
> [ 1254.842797][ C2] __wake_up_common_lock+0xc6/0x150
> [ 1254.848408][ C2] __wake_up+0x13/0x20
> [ 1254.852882][ C2] account.constprop.9+0x217/0x340
> [ 1254.858988][ C2] extract_entropy.constprop.7+0xcf/0x220
> [ 1254.865122][ C2] crng_reseed+0xa1/0x3f0
> [ 1254.869859][ C2] _extract_crng+0xc3/0xd0
> [ 1254.874682][ C2] crng_reseed+0x21b/0x3f0
> [ 1254.879505][ C2] _extract_crng+0xc3/0xd0
> [ 1254.884772][ C2] extract_crng+0x40/0x60
> [ 1254.889507][ C2] get_random_u32+0xb4/0xe0
> [ 1254.894417][ C2] allocate_slab+0x6d6/0x19c0
> [ 1254.899501][ C2] new_slab+0x46/0x70
> [ 1254.903886][ C2] ___slab_alloc+0x58b/0x960
> [ 1254.909377][ C2] __slab_alloc+0x43/0x70
> [ 1254.914112][ C2] kmem_cache_alloc+0x354/0x460
> [ 1254.919375][ C2] __build_skb+0x23/0x60
> [ 1254.924024][ C2] __netdev_alloc_skb+0x127/0x1e0
> [ 1254.929470][ C2] tg3_poll_work+0x11b2/0x1f70 [tg3]
> [ 1254.935654][ C2] tg3_poll_msix+0x67/0x330 [tg3]
> [ 1254.941092][ C2] net_rx_action+0x24e/0x7e0
> [ 1254.946089][ C2] __do_softirq+0x123/0x767
> [ 1254.951000][ C2] irq_exit+0xd6/0xf0
> [ 1254.955385][ C2] do_IRQ+0xe2/0x1a0
> [ 1254.960155][ C2] ret_from_intr+0x0/0x2a
> [ 1254.964896][ C2] cpuidle_enter_state+0x156/0x8e0
> [ 1254.970418][ C2] cpuidle_enter+0x41/0x70
> [ 1254.975242][ C2] call_cpuidle+0x5e/0x90
> [ 1254.979975][ C2] do_idle+0x333/0x370
> [ 1254.984972][ C2] cpu_startup_entry+0x1d/0x1f
> [ 1254.990148][ C2] start_secondary+0x290/0x330
> [ 1254.995319][ C2] secondary_startup_64+0xb6/0xc0
> [ 1255.000750][ C2]
> [ 1255.000750][ C2] other info that might help us debug this:
> [ 1255.000750][ C2]
> [ 1255.011424][ C2] Chain exists of:
> [ 1255.011424][ C2] random_write_wait.lock --> &rq->lock -->
> batched_entropy_u32.lock
> [ 1255.011424][ C2]
> [ 1255.025245][ C2] Possible unsafe locking scenario:
> [ 1255.025245][ C2]
> [ 1255.033012][ C2] CPU0 CPU1
> [ 1255.038270][ C2] ---- ----
> [ 1255.043526][ C2] lock(batched_entropy_u32.lock);
> [ 1255.048610][ C2] lock(&rq->lock);
> [
> 1255.054918][ C2] lock(batched_entropy_u32.loc
> k);
> [ 1255.063035][ C2] lock(random_write_wait.lock);
> [ 1255.067945][ C2]
> [ 1255.067945][ C2] *** DEADLOCK ***
> [ 1255.067945][ C2]
> [ 1255.076000][ C2] 1 lock held by swapper/2/0:
> [ 1255.080558][ C2] #0: ffff88845373fda0 (batched_entropy_u32.lock){-.-.},
> at: get_random_u32+0x4c/0xe0
> [ 1255.090547][ C2]
> [ 1255.090547][ C2] stack backtrace:
> [ 1255.096333][ C2] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-rc5-next-
> 20190822 #1
> [ 1255.104473][ C2] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [ 1255.114276][ C2] Call Trace:
> [ 1255.117439][ C2] <IRQ>
> [ 1255.120169][ C2] dump_stack+0x86/0xca
> [ 1255.124205][ C2] print_circular_bug.cold.32+0x243/0x26e
> [ 1255.129816][ C2] check_noncircular+0x29e/0x2e0
> [ 1255.135221][ C2] ? __bfs+0x238/0x380
> [ 1255.139172][ C2] ? print_circular_bug+0x120/0x120
> [ 1255.144259][ C2] ? find_usage_forwards+0x7d/0xb0
> [ 1255.149260][ C2] check_prev_add+0x107/0xea0
> [ 1255.153823][ C2] validate_chain+0x8fc/0x1200
> [ 1255.159007][ C2] ? check_prev_add+0xea0/0xea0
> [ 1255.163743][ C2] ? check_usage_backwards+0x210/0x210
> [ 1255.169091][ C2] __lock_acquire+0x5b3/0xb40
> [ 1255.173655][ C2] lock_acquire+0x126/0x280
> [ 1255.178041][ C2] ? __wake_up_common_lock+0xc6/0x150
> [ 1255.183732][ C2] _raw_spin_lock_irqsave+0x3a/0x50
> [ 1255.188817][ C2] ? __wake_up_common_lock+0xc6/0x150
> [ 1255.194076][ C2] __wake_up_common_lock+0xc6/0x150
> [ 1255.199163][ C2] ? __wake_up_common+0x2a0/0x2a0
> [ 1255.204078][ C2] ? rcu_read_lock_any_held.part.5+0x20/0x20
> [ 1255.210428][ C2] __wake_up+0x13/0x20
> [ 1255.214379][ C2] account.constprop.9+0x217/0x340
> [ 1255.219377][ C2] extract_entropy.constprop.7+0xcf/0x220
> [ 1255.224987][ C2] ? crng_reseed+0xa1/0x3f0
> [ 1255.229375][ C2] crng_reseed+0xa1/0x3f0
> [ 1255.234122][ C2] ? rcu_read_lock_sched_held+0xac/0xe0
> [ 1255.239556][ C2] ? check_flags.part.16+0x86/0x220
> [ 1255.244641][ C2] ? extract_entropy.constprop.7+0x220/0x220
> [ 1255.250511][ C2] ? __kasan_check_read+0x11/0x20
> [ 1255.255422][ C2] ? validate_chain+0xab/0x1200
> [ 1255.260742][ C2] ? rcu_read_lock_any_held.part.5+0x20/0x20
> [ 1255.266616][ C2] _extract_crng+0xc3/0xd0
> [ 1255.270915][ C2] crng_reseed+0x21b/0x3f0
> [ 1255.275215][ C2] ? extract_entropy.constprop.7+0x220/0x220
> [ 1255.281085][ C2] ? __kasan_check_write+0x14/0x20
> [ 1255.286517][ C2] ? do_raw_spin_lock+0x118/0x1d0
> [ 1255.291428][ C2] ? rwlock_bug.part.0+0x60/0x60
> [ 1255.296251][ C2] _extract_crng+0xc3/0xd0
> [ 1255.300550][ C2] extract_crng+0x40/0x60
> [ 1255.304763][ C2] get_random_u32+0xb4/0xe0
> [ 1255.309640][ C2] allocate_slab+0x6d6/0x19c0
> [ 1255.314203][ C2] new_slab+0x46/0x70
> [ 1255.318066][ C2] ___slab_alloc+0x58b/0x960
> [ 1255.322539][ C2] ? __build_skb+0x23/0x60
> [ 1255.326841][ C2] ? fault_create_debugfs_attr+0x140/0x140
> [ 1255.333048][ C2] ? __build_skb+0x23/0x60
> [ 1255.337348][ C2] __slab_alloc+0x43/0x70
> [ 1255.341559][ C2] ? __slab_alloc+0x43/0x70
> [ 1255.345944][ C2] ? __build_skb+0x23/0x60
> [ 1255.350242][ C2] kmem_cache_alloc+0x354/0x460
> [ 1255.354978][ C2] ? __netdev_alloc_skb+0x1c6/0x1e0
> [ 1255.360626][ C2] ? trace_hardirqs_on+0x3a/0x160
> [ 1255.365535][ C2] __build_skb+0x23/0x60
> [ 1255.369660][ C2] __netdev_alloc_skb+0x127/0x1e0
> [ 1255.374576][ C2] tg3_poll_work+0x11b2/0x1f70 [tg3]
> [ 1255.379750][ C2] ? find_held_lock+0x11b/0x150
> [ 1255.385027][ C2] ? tg3_tx_recover+0xa0/0xa0 [tg3]
> [ 1255.390114][ C2] ? _raw_spin_unlock_irqrestore+0x38/0x50
> [ 1255.395809][ C2] ? __kasan_check_read+0x11/0x20
> [ 1255.400718][ C2] ? validate_chain+0xab/0x1200
> [ 1255.405455][ C2] ? __wake_up_common+0x2a0/0x2a0
> [ 1255.410761][ C2] ? mark_held_locks+0x34/0xb0
> [ 1255.415415][ C2] tg3_poll_msix+0x67/0x330 [tg3]
> [ 1255.420327][ C2] net_rx_action+0x24e/0x7e0
> [ 1255.424800][ C2] ? find_held_lock+0x11b/0x150
> [ 1255.429536][ C2] ? napi_busy_loop+0x600/0x600
> [ 1255.434733][ C2] ? rcu_read_lock_sched_held+0xac/0xe0
> [ 1255.440169][ C2] ? __do_softirq+0xed/0x767
> [ 1255.444642][ C2] ? rcu_read_lock_any_held.part.5+0x20/0x20
> [ 1255.450518][ C2] ? lockdep_hardirqs_on+0x1b0/0x2a0
> [ 1255.455693][ C2] ? irq_exit+0xd6/0xf0
> [ 1255.460280][ C2] __do_softirq+0x123/0x767
> [ 1255.464668][ C2] irq_exit+0xd6/0xf0
> [ 1255.468532][ C2] do_IRQ+0xe2/0x1a0
> [ 1255.472308][ C2] common_interrupt+0xf/0xf
> [ 1255.476694][ C2] </IRQ>
> [ 1255.479509][ C2] RIP: 0010:cpuidle_enter_state+0x156/0x8e0
> [ 1255.485750][ C2] Code: bf ff 8b 05 a4 27 2d 01 85 c0 0f 8f 1d 04 00 00 31
> ff e8 4d ba 92 ff 80 7d d0 00 0f 85 0b 02 00 00 e8 ae c0 a7 ff fb 45 85 ed <0f>
> 88 2d 02 00 00 4d 63 fd 49 83 ff 09 0f 87 91 06 00 00 4b 8d 04
> [ 1255.505335][ C2] RSP: 0018:ffff888206637cf8 EFLAGS: 00000202 ORIG_RAX:
> ffffffffffffffc8
> [ 1255.514154][ C2] RAX: 0000000000000000 RBX: ffff889f98b44008 RCX:
> ffffffffa116e980
> [ 1255.522033][ C2] RDX: 0000000000000007 RSI: dffffc0000000000 RDI:
> ffff8882066287ec
> [ 1255.529913][ C2] RBP: ffff888206637d48 R08: fffffbfff4557aee R09:
> 0000000000000000
> [ 1255.538278][ C2] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffffa28e8ac0
> [ 1255.546158][ C2] R13: 0000000000000002 R14: 0000012412160253 R15:
> ffff889f98b4400c
> [ 1255.554040][ C2] ? lockdep_hardirqs_on+0x1b0/0x2a0
> [ 1255.559725][ C2] ? cpuidle_enter_state+0x152/0x8e0
> [ 1255.564898][ C2] cpuidle_enter+0x41/0x70
> [ 1255.569196][ C2] call_cpuidle+0x5e/0x90
> [ 1255.573408][ C2] do_idle+0x333/0x370
> [ 1255.577358][ C2] ? complete+0x51/0x60
> [ 1255.581394][ C2] ? arch_cpu_idle_exit+0x40/0x40
> [ 1255.586777][ C2] ? complete+0x51/0x60
> [ 1255.590814][ C2] cpu_startup_entry+0x1d/0x1f
> [ 1255.595461][ C2] start_secondary+0x290/0x330
> [ 1255.600111][ C2] ? set_cpu_sibling_map+0x18f0/0x18f0
> [ 1255.605460][ C2] secondary_startup_64+0xb6/0xc0

This looks like a false positive. shuffle_freelist() introduced the chain,

batched_entropy_u32.lock -> random_write_wait.lock

but I canât see how it is possible to get the reversed chain. Imaging something like this,

__wake_up_common_lock <â acquired random_write_wait.lock
__sched_fork <â is that even possible?
init_dl_task_timer
debug_object_init
allocate_slab
shuffle_freelist
get_random_u32 <â acquired batched_entropy_u32.lock