Re: stress-ng --hrtimers hangs system

From: Paul E. McKenney
Date: Thu Jun 04 2020 - 23:20:18 EST


On Fri, Jun 05, 2020 at 04:47:51AM +0300, Vladimir Oltean wrote:
> Hi,
>
> I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test:
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds,
> letting me know that the grace-periods kernel thread could not run:
>
> # stress-ng --hrtimers 1
> stress-ng: info: [320] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
> stress-ng: info: [320] dispatching hogs: 1 hrtimers
> stress-ng: info: [320] cache allocate: using defaults, can't determine cache details from sysfs
> [ 85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 85.833656] (detected by 1, t=6502 jiffies, g=1789, q=12)
> [ 85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [ 85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 85.871025] rcu: RCU grace-period kthread stack dump:
> [ 85.876094] rcu_preempt R running task 0 10 2 0x00000028
> [ 85.883173] Call trace:
> [ 85.885636] __switch_to+0xf8/0x148
> [ 85.889137] __schedule+0x2d8/0x808
> [ 85.892636] schedule+0x48/0x100
> [ 85.895875] schedule_timeout+0x1c8/0x420
> [ 85.899900] rcu_gp_kthread+0x738/0x1b78
> [ 85.903836] kthread+0x158/0x168
> [ 85.907075] ret_from_fork+0x10/0x18
> [ 93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 33s!
> [ 93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 33s!
> [ 93.299105] Showing busy workqueues and worker pools:
> [ 93.304189] workqueue events: flags=0x0
> [ 93.308116] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 93.315031] pending: vmstat_shepherd
> [ 93.318990] workqueue events_unbound: flags=0x2
> [ 93.323577] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3
> [ 93.330309] pending: flush_to_ldisc
> [ 93.334161]
> [ 93.334164] ======================================================
> [ 93.334165] WARNING: possible circular locking dependency detected
> [ 93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted
> [ 93.334168] ------------------------------------------------------
> [ 93.334169] stress-ng-hrtim/326 is trying to acquire lock:
> [ 93.334171] ffffb236f6aee1a0 (console_owner){-.-.}-{0:0}, at: console_unlock+0x21c/0x678
> [ 93.334176]
> [ 93.334177] but task is already holding lock:
> [ 93.334178] ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
> [ 93.334184]
> [ 93.334186] which lock already depends on the new lock.
> [ 93.334187]
> [ 93.334188]
> [ 93.334189] the existing dependency chain (in reverse order) is:
> [ 93.334190]
> [ 93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}:
> [ 93.334197] _raw_spin_lock+0x5c/0x78
> [ 93.334198] __queue_work+0x124/0x7c8
> [ 93.334199] queue_work_on+0xd0/0xf0
> [ 93.334200] tty_flip_buffer_push+0x3c/0x48
> [ 93.334202] serial8250_rx_chars+0x74/0x88
> [ 93.334203] fsl8250_handle_irq+0x15c/0x1a0
> [ 93.334204] serial8250_interrupt+0x70/0xb8
> [ 93.334206] __handle_irq_event_percpu+0xe0/0x478
> [ 93.334207] handle_irq_event_percpu+0x40/0x98
> [ 93.334208] handle_irq_event+0x4c/0xd0
> [ 93.334209] handle_fasteoi_irq+0xb4/0x158
> [ 93.334211] generic_handle_irq+0x3c/0x58
> [ 93.334212] __handle_domain_irq+0x68/0xc0
> [ 93.334213] gic_handle_irq+0x6c/0x160
> [ 93.334214] el1_irq+0xbc/0x180
> [ 93.334216] cpuidle_enter_state+0xb4/0x4f8
> [ 93.334217] cpuidle_enter+0x3c/0x50
> [ 93.334218] call_cpuidle+0x44/0x78
> [ 93.334219] do_idle+0x228/0x2c8
> [ 93.334220] cpu_startup_entry+0x2c/0x48
> [ 93.334222] rest_init+0x1ac/0x280
> [ 93.334223] arch_call_rest_init+0x14/0x1c
> [ 93.334224] start_kernel+0x4ec/0x524
> [ 93.334225]
> [ 93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}:
> [ 93.334232] _raw_spin_lock_irqsave+0x78/0xa0
> [ 93.334233] serial8250_console_write+0x1f4/0x348
> [ 93.334234] univ8250_console_write+0x44/0x58
> [ 93.334235] console_unlock+0x480/0x678
> [ 93.334237] vprintk_emit+0x188/0x370
> [ 93.334238] vprintk_default+0x48/0x58
> [ 93.334239] vprintk_func+0xf0/0x238
> [ 93.334240] printk+0x74/0x94
> [ 93.334241] register_console+0x1a0/0x300
> [ 93.334243] uart_add_one_port+0x4a0/0x4e0
> [ 93.334244] serial8250_register_8250_port+0x2d0/0x4b0
> [ 93.334245] of_platform_serial_probe+0x318/0x650
> [ 93.334246] platform_drv_probe+0x58/0xa8
> [ 93.334248] really_probe+0x10c/0x358
> [ 93.334249] driver_probe_device+0x5c/0x100
> [ 93.334250] device_driver_attach+0x74/0x98
> [ 93.334252] __driver_attach+0x98/0xe0
> [ 93.334253] bus_for_each_dev+0x84/0xd8
> [ 93.334254] driver_attach+0x30/0x40 [ 93.334255] bus_add_driver+0x160/0x208
> [ 93.334256] driver_register+0x64/0x110
> [ 93.334258] __platform_driver_register+0x58/0x68
> [ 93.334259] of_platform_serial_driver_init+0x20/0x28
> [ 93.334260] do_one_initcall+0x94/0x438
> [ 93.334261] kernel_init_freeable+0x2cc/0x338
> [ 93.334263] kernel_init+0x18/0x110 [ 93.334264] ret_from_fork+0x10/0x18 [ 93.334265]
> [ 93.334266] -> #0 (console_owner){-.-.}-{0:0}:
> [ 93.334271] __lock_acquire+0x1170/0x1510
> [ 93.334272] lock_acquire+0xf8/0x458 [ 93.334273] console_unlock+0x274/0x678
> [ 93.334274] vprintk_emit+0x188/0x370
> [ 93.334275] vprintk_default+0x48/0x58
> [ 93.334277] vprintk_func+0xf0/0x238 [ 93.334278] printk+0x74/0x94
> [ 93.334279] show_pwq+0x80/0x2c0
> [ 93.334280] show_workqueue_state+0x2ac/0x3a8
> [ 93.334281] wq_watchdog_timer_fn+0x194/0x2d8
> [ 93.334283] call_timer_fn+0xcc/0x400
> [ 93.334284] run_timer_softirq+0x250/0x6e0
> [ 93.334285] __do_softirq+0x12c/0x5b4
> [ 93.334286] irq_exit+0x13c/0x148
> [ 93.334287] __handle_domain_irq+0x6c/0xc0
> [ 93.334289] gic_handle_irq+0x6c/0x160
> [ 93.334290] el1_irq+0xbc/0x180
> [ 93.334291] lock_acquire+0x110/0x458
> [ 93.334292] __might_fault+0x84/0xa8 [ 93.334293] __arm64_sys_rt_sigreturn+0x154/0x980
> [ 93.334295] el0_svc_common.constprop.2+0x7c/0x178
> [ 93.334296] do_el0_svc+0x34/0xa0
> [ 93.334297] el0_sync_handler+0x160/0x1f0
> [ 93.334298] el0_sync+0x140/0x180
> [ 93.334299]
> [ 93.334301] other info that might help us debug this:
> [ 93.334301]
> [ 93.334303] Chain exists of:
> [ 93.334303] console_owner --> &port->lock#2 --> &pool->lock/1
> [ 93.334311]
> [ 93.334313] Possible unsafe locking scenario:
> [ 93.334314]
> [ 93.334315] CPU0 CPU1
> [ 93.334316] ---- ----
> [ 93.334317] lock(&pool->lock/1);
> [ 93.334321] lock(&port->lock#2);
> [ 93.334325] lock(&pool->lock/1);
> [ 93.334328] lock(console_owner);
> [ 93.334331]
> [ 93.334332] *** DEADLOCK ***
> [ 93.334333]
> [ 93.334334] 5 locks held by stress-ng-hrtim/326:
> [ 93.334335] #0: ffff002077cf9a18 (&mm->mmap_sem){++++}-{3:3}, at: __might_fault+0x60/0xa8
> [ 93.334341] #1: ffff800010003de0 ((&wq_watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
> [ 93.334347] #2: ffffb236f6aefd60 (rcu_read_lock){....}-{1:2}, at: show_workqueue_state+0x0/0x3a8
> [ 93.334353] #3: ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
> [ 93.334360] #4: ffffb236f6aee080 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x180/0x370
> [ 93.334366]
> [ 93.334367] stack backtrace:
> [ 93.334369] CPU: 0 PID: 326 Comm: stress-ng-hrtim Not tainted 5.7.0-08604-g7dcedf8c04c0 #118
> [ 93.334370] Hardware name: LS1028A RDB Board (DT)
> [ 93.334371] Call trace:
> [ 93.334372] dump_backtrace+0x0/0x1a8
> [ 93.334373] show_stack+0x24/0x30
> [ 93.334374] dump_stack+0xe8/0x150
> [ 93.334376] print_circular_bug.isra.41+0x228/0x280
> [ 93.334377] check_noncircular+0x14c/0x1b0
> [ 93.334378] __lock_acquire+0x1170/0x1510 [ 93.334379] lock_acquire+0xf8/0x458
> [ 93.334381] console_unlock+0x274/0x678
> [ 93.334382] vprintk_emit+0x188/0x370
> [ 93.334383] vprintk_default+0x48/0x58
> [ 93.334384] vprintk_func+0xf0/0x238
> [ 93.334385] printk+0x74/0x94
> [ 93.334386] show_pwq+0x80/0x2c0
> [ 93.334387] show_workqueue_state+0x2ac/0x3a8
> [ 93.334389] wq_watchdog_timer_fn+0x194/0x2d8
> [ 93.334390] call_timer_fn+0xcc/0x400
> [ 93.334391] run_timer_softirq+0x250/0x6e0
> [ 93.334392] __do_softirq+0x12c/0x5b4
> [ 93.334393] irq_exit+0x13c/0x148
> [ 93.334394] __handle_domain_irq+0x6c/0xc0
> [ 93.334396] gic_handle_irq+0x6c/0x160
> [ 93.334397] el1_irq+0xbc/0x180
> [ 93.334398] lock_acquire+0x110/0x458
> [ 93.334399] __might_fault+0x84/0xa8
> [ 93.334400] __arm64_sys_rt_sigreturn+0x154/0x980
> [ 93.334402] el0_svc_common.constprop.2+0x7c/0x178
> [ 93.334403] do_el0_svc+0x34/0xa0
> [ 93.334404] el0_sync_handler+0x160/0x1f0 [ 93.334405] el0_sync+0x140/0x180
> [ 93.984979] workqueue events_freezable: flags=0x4
> [ 93.989705] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 93.996522] pending: mmc_rescan
> [ 94.000032] workqueue events_power_efficient: flags=0x82
> [ 94.005369] pwq 4: cpus=0-1 flags=0x4 nice=0 active=4/256 refcnt=6
> [ 94.011748] pending: phy_state_machine, phy_state_machine, neigh_periodic_work, do_cache_clean
> [ 94.020761] workqueue pm: flags=0x4
> [ 94.024266] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 94.031081] pending: pm_runtime_work
> [ 94.035025] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 94.041840] pending: pm_runtime_work
> [ 94.045782] workqueue writeback: flags=0x4a
> [ 94.049984] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=3
> [ 94.056362] pending: wb_workfn
> [ 94.059784] workqueue kblockd: flags=0x18 [ 94.063812] pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
> [ 94.070802] pending: blk_mq_timeout_work
>
> Looks like there's another issue in fsl8250_handle_irq, but I think that's unrelated so I'll ignore it.
> So it looks like CPU 1 detects that the rcu_preempt kthread isn't scheduled, despite the fact that all
> quiescent states were seen. So it looks like there is no RCU usage bug in here, but a legitimate CPU
> starvation issue as far as I can see.

That agrees with my interpretation of the stall warning message.

On the rest, I must defer to others.

Thanx, Paul

> I have some noob questions:
>
> - Is my system (hardware, drivers, kernel config) misconfigured? This can be reproduced at least on the
> NXP LS1028A running today's master branch with just the arm64 defconfig. On other hardware however
> (NXP T1040 PowerPC), RT throttling kicks in faster and the system doesn't hang (more below).
>
> - What is the expected behavior of a system to this hrtimer test? Intuitively I would say the kenel
> shouldn't hang, but I don't exactly know what are the mechanisms to avoid that. Below are 2 things I tried.
>
> - I tried to enable CONFIG_RCU_BOOST and had limited success. I could get rid of the RCU stall messages
> but the system would still hang after a longer while. When it hanged, it didn't even respond to ping
> from the outside, no shell, only magic sysrq could restart the kernel.
>
> - I read a little bit of the "Limiting the CPU usage of real-time and deadline processes" chapter from
> man 7 sched. Is this what I should be doing? My /proc/sys/kernel/sched_rt_period_us and
> /proc/sys/kernel/sched_rt_runtime_us are set to their default values (RT period 1 second, RT runtime
> .95 seconds), why isn't that enough? If I reduce the RT runtime to .8 seconds while keeping the period
> at 1 second, RT throttling (finally!) kicks in and the system does not die.
> The RCU stall warning does keep, however, getting printed continuously.
>
> [ 385.732074] sched: RT throttling activated
> [ 385.883960] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 385.890077] (detected by 1, t=6503 jiffies, g=1801, q=68)
> [ 385.895582] rcu: All QSes seen, last rcu_preempt kthread activity 6504 (4294988738-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 385.908062] rcu: rcu_preempt kthread starved for 6508 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [ 385.918273] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 385.927437] rcu: RCU grace-period kthread stack dump:
> [ 385.932503] rcu_preempt R running task 0 11 2 0x00000028
> [ 385.939579] Call trace:
> [ 385.942040] __switch_to+0xf8/0x148
> [ 385.945539] __schedule+0x2d8/0x808
> [ 385.949036] schedule+0x48/0x100
> [ 385.952273] schedule_timeout+0x1c8/0x420
> [ 385.956295] rcu_gp_kthread+0x704/0x1870
> [ 385.960228] kthread+0x158/0x168
> [ 385.963464] ret_from_fork+0x10/0x18
> (...)
> [ 463.903959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 463.910077] (detected by 1, t=26008 jiffies, g=1801, q=148)
> [ 463.915756] rcu: All QSes seen, last rcu_preempt kthread activity 26009 (4295008243-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 463.928324] rcu: rcu_preempt kthread starved for 26013 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [ 463.938622] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 463.947786] rcu: RCU grace-period kthread stack dump:
> [ 463.952852] rcu_preempt R running task 0 11 2 0x00000028
> [ 463.959928] Call trace:
> [ 463.962389] __switch_to+0xf8/0x148
> [ 463.965888] __schedule+0x2d8/0x808
> [ 463.969385] schedule+0x48/0x100
> [ 463.972622] schedule_timeout+0x1c8/0x420
> [ 463.976644] rcu_gp_kthread+0x704/0x1870
> [ 463.980578] kthread+0x158/0x168
> [ 463.983814] ret_from_fork+0x10/0x18
> (...)
> [ 541.923959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 541.930076] (detected by 1, t=45513 jiffies, g=1801, q=230)
> [ 541.935754] rcu: All QSes seen, last rcu_preempt kthread activity 45514 (4295027748-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 541.948322] rcu: rcu_preempt kthread starved for 45518 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [ 541.958620] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 541.967784] rcu: RCU grace-period kthread stack dump:
> [ 541.972850] rcu_preempt R running task 0 11 2 0x00000028
> [ 541.979926] Call trace:
> [ 541.982387] __switch_to+0xf8/0x148
> [ 541.985887] __schedule+0x2d8/0x808
> [ 541.989384] schedule+0x48/0x100
> [ 541.992621] schedule_timeout+0x1c8/0x420
> [ 541.996643] rcu_gp_kthread+0x704/0x1870
> [ 542.000576] kthread+0x158/0x168
> [ 542.003813] ret_from_fork+0x10/0x18
>
> I would really appreciate some pointers.
>
> Thanks,
> -Vladimir