Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

From: Paul E. McKenney
Date: Sun Sep 08 2024 - 12:32:33 EST


On Thu, Aug 29, 2024 at 07:13:07AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 29, 2024 at 03:50:03PM +0200, Valentin Schneider wrote:
> > On 29/08/24 03:28, Paul E. McKenney wrote:
> > > On Wed, Aug 28, 2024 at 11:39:19AM -0700, Paul E. McKenney wrote:
> > >>
> > >> The 500*TREE03 run had exactly one failure that was the dreaded
> > >> enqueue_dl_entity() failure, followed by RCU CPU stall warnings.
> > >>
> > >> But a huge improvement over the prior state!
> > >>
> > >> Plus, this failure is likely unrelated (see earlier discussions with
> > >> Peter). I just started a 5000*TREE03 run, just in case we can now
> > >> reproduce this thing.
> > >
> > > And we can now reproduce it! Again, this might an unrelated bug that
> > > was previously a one-off (OK, OK, a two-off!). Or this series might
> > > have made it more probably. Who knows?
> > >
> > > Eight of those 5000 runs got us this splat in enqueue_dl_entity():
> > >
> > > WARN_ON_ONCE(on_dl_rq(dl_se));
> > >
> > > Immediately followed by this splat in __enqueue_dl_entity():
> > >
> > > WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node));
> > >
> > > These two splats always happened during rcutorture's testing of
> > > RCU priority boosting. This testing involves spawning a CPU-bound
> > > low-priority real-time kthread for each CPU, which is intended to starve
> > > the non-realtime RCU readers, which are in turn to be rescued by RCU
> > > priority boosting.
> > >
> >
> > Thanks!
> >
> > > I do not entirely trust the following rcutorture diagnostic, but just
> > > in case it helps...
> > >
> > > Many of them also printed something like this as well:
> > >
> > > [ 111.279575] Boost inversion persisted: No QS from CPU 3
> > >
> > > This message means that rcutorture has decided that RCU priority boosting
> > > has failed, but not because a low-priority preempted task was blocking
> > > the grace period, but rather because some CPU managed to be running
> > > the same task in-kernel the whole time without doing a context switch.
> > > In some cases (but not this one), this was simply a side-effect of
> > > RCU's grace-period kthread being starved of CPU time. Such starvation
> > > is a surprise in this case because this kthread is running at higher
> > > real-time priority than the kthreads that are intended to force RCU
> > > priority boosting to happen.
> > >
> > > Again, I do not entirely trust this rcutorture diagnostic, just in case
> > > it helps.
> > >
> > > Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > [ 287.536845] rcu-torture: rcu_torture_boost is stopping
> > > [ 287.536867] ------------[ cut here ]------------
> > > [ 287.540661] WARNING: CPU: 4 PID: 132 at kernel/sched/deadline.c:2003 enqueue_dl_entity+0x50d/0x5c0
> > > [ 287.542299] Modules linked in:
> > > [ 287.542868] CPU: 4 UID: 0 PID: 132 Comm: kcompactd0 Not tainted 6.11.0-rc1-00051-gb32d207e39de #1701
> > > [ 287.544335] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > > [ 287.546337] RIP: 0010:enqueue_dl_entity+0x50d/0x5c0
> > > [ 287.603245] ? __warn+0x7e/0x120
> > > [ 287.603752] ? enqueue_dl_entity+0x54b/0x5c0
> > > [ 287.604405] ? report_bug+0x18e/0x1a0
> > > [ 287.604978] ? handle_bug+0x3d/0x70
> > > [ 287.605523] ? exc_invalid_op+0x18/0x70
> > > [ 287.606116] ? asm_exc_invalid_op+0x1a/0x20
> > > [ 287.606765] ? enqueue_dl_entity+0x54b/0x5c0
> > > [ 287.607420] dl_server_start+0x31/0xe0
> > > [ 287.608013] enqueue_task_fair+0x218/0x680
> > > [ 287.608643] activate_task+0x21/0x50
> > > [ 287.609197] attach_task+0x30/0x50
> > > [ 287.609736] sched_balance_rq+0x65d/0xe20
> > > [ 287.610351] sched_balance_newidle.constprop.0+0x1a0/0x360
> > > [ 287.611205] pick_next_task_fair+0x2a/0x2e0
> > > [ 287.611849] __schedule+0x106/0x8b0
> >
> >
> > Assuming this is still related to switched_from_fair(), since this is hit
> > during priority boosting then it would mean rt_mutex_setprio() gets
> > involved, but that uses the same set of DQ/EQ flags as
> > __sched_setscheduler().
> >
> > I don't see any obvious path in
> >
> > dequeue_task_fair()
> > `\
> > dequeue_entities()
> >
> > that would prevent dl_server_stop() from happening when doing the
> > class-switch dequeue_task()... I don't see it in the TREE03 config, but can
> > you confirm CONFIG_CFS_BANDWIDTH isn't set in that scenario?
> >
> > I'm going to keep digging but I'm not entirely sure yet whether this is
> > related to the switched_from_fair() hackery or not, I'll send the patch I
> > have as-is and continue digging for a bit.
>
> Makes sense to me, thank you, and glad that the diagnostics helped.
>
> Looking forward to further fixes. ;-)

Just following up...

For whatever it is worth, on last night's run of next-20240906, I got
nine failures out of 100 6-hour runs of rcutorture’s TREE03 scenario.
These failures were often, but not always, shortly followed by a hard hang.

The warning at line 1995 is the WARN_ON_ONCE(on_dl_rq(dl_se))
in enqueue_dl_entity() and the warning at line 1971 is the
WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node)) in __enqueue_dl_entity().

The pair of splats is shown below, in case it helps.

Thanx, Paul

------------------------------------------------------------------------

[21122.992435] ------------[ cut here ]------------
[21122.994090] WARNING: CPU: 13 PID: 8032 at kernel/sched/deadline.c:1995 enqueue_dl_entity+0x511/0x5d0
[21122.995554] Modules linked in:
[21122.996048] CPU: 13 UID: 0 PID: 8032 Comm: kworker/13:1 Not tainted 6.11.0-rc6-next-20240906-dirty #2006
[21122.997548] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[21122.999310] Workqueue: 0x0 (mm_percpu_wq)
[21122.999981] RIP: 0010:enqueue_dl_entity+0x511/0x5d0
[21123.000757] Code: ff 48 89 ef e8 b0 d2 ff ff 0f b6 4d 54 e9 0e fc ff ff 85 db
0f 84 d0 fe ff ff 5b 44 89 e6 48 89 ef 5d 41 5c e9 00 df ff ff 90 <0f> 0b 90 e9 fa fa ff ff 48 8b bb f8 09 00 00 48 39 fe 0f 89 de fb
[21123.003697] RSP: 0000:ffffabca84373bf0 EFLAGS: 00010086
[21123.004537] RAX: 0000000000000001 RBX: ffff98381f56cae8 RCX: 0000000000000002
[21123.005660] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff98381f56cae8
[21123.006781] RBP: ffff98381f56cae8 R08: 0000000000000001 R09: 0000000000000161
[21123.007902] R10: 0000000000000000 R11: ffff983802399d90 R12: 0000000000000001
[21123.009026] R13: 00000000002dc6c0 R14: ffff98381f56c240 R15: ffff98381f56c280
[21123.010213] FS: 0000000000000000(0000) GS:ffff98381f540000(0000) knlGS:0000000000000000
[21123.011584] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21123.012498] CR2: 0000000000000000 CR3: 0000000002c16000 CR4: 00000000000006f0
[21123.013647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21123.014780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21123.015915] Call Trace:
[21123.016326] <TASK>
[21123.016672] ? __warn+0x83/0x130
[21123.017198] ? enqueue_dl_entity+0x511/0x5d0
[21123.017877] ? report_bug+0x18e/0x1a0
[21123.018469] ? handle_bug+0x54/0x90
[21123.019027] ? exc_invalid_op+0x18/0x70
[21123.019647] ? asm_exc_invalid_op+0x1a/0x20
[21123.020318] ? enqueue_dl_entity+0x511/0x5d0
[21123.020997] dl_server_start+0x31/0xe0
[21123.021603] enqueue_task_fair+0x218/0x680
[21123.022264] activate_task+0x21/0x50
[21123.022837] attach_task+0x30/0x50
[21123.023389] sched_balance_rq+0x65e/0xe00
[21123.024031] sched_balance_newidle.constprop.0+0x190/0x360
[21123.024903] pick_next_task_fair+0x2a/0x340
[21123.025576] __schedule+0x10e/0x8b0
[21123.026135] ? queue_delayed_work_on+0x53/0x60
[21123.026849] schedule+0x22/0xd0
[21123.027366] worker_thread+0x1a2/0x3a0
[21123.027963] ? __pfx_worker_thread+0x10/0x10
[21123.028651] kthread+0xd1/0x100
[21123.029153] ? __pfx_kthread+0x10/0x10
[21123.029758] ret_from_fork+0x2f/0x50
[21123.030334] ? __pfx_kthread+0x10/0x10
[21123.030933] ret_from_fork_asm+0x1a/0x30
[21123.031566] </TASK>
[21123.031920] ---[ end trace 0000000000000000 ]---
[21123.032669] ------------[ cut here ]------------
[21123.033409] WARNING: CPU: 13 PID: 8032 at kernel/sched/deadline.c:1971 enqueue_dl_entity+0x54f/0x5d0
[21123.034853] Modules linked in:
[21123.035354] CPU: 13 UID: 0 PID: 8032 Comm: kworker/13:1 Tainted: G W 6.11.0-rc6-next-20240906-dirty #2006
[21123.037081] Tainted: [W]=WARN
[21123.037562] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[21123.039331] Workqueue: 0x0 (mm_percpu_wq)
[21123.039984] RIP: 0010:enqueue_dl_entity+0x54f/0x5d0
[21123.040767] Code: de fb ff ff e9 66 ff ff ff 89 c1 45 84 d2 0f 84 ce fb ff ff a8 20 0f 84 c6 fb ff ff 84 c0 0f 89 20 fe ff ff e9 b9 fb ff ff 90 <0f> 0b 90 e9 f4 fb ff ff 84 d2 0f 85 e3 fa ff ff 48 89 ea 48 8d b5
[21123.043716] RSP: 0000:ffffabca84373bf0 EFLAGS: 00010086
[21123.044549] RAX: 00000000ffffff00 RBX: ffff98381f56c240 RCX: 0000000000000000
[21123.045676] RDX: 0000000000000001 RSI: 0000000b1a2986b8 RDI: 0000000b1a2154a4
[21123.046806] RBP: ffff98381f56cae8 R08: ffff98381f56ca80 R09: 000000003b9aca00
[21123.047931] R10: 0000000000000001 R11: 00000000000ee6b2 R12: 0000000000000001
[21123.049061] R13: 00000000002dc6c0 R14: ffff98381f56c240 R15: ffff98381f56c280
[21123.050469] FS: 0000000000000000(0000) GS:ffff98381f540000(0000) knlGS:0000000000000000
[21123.051761] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21123.052679] CR2: 0000000000000000 CR3: 0000000002c16000 CR4: 00000000000006f0
[21123.053817] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21123.054952] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21123.056086] Call Trace:
[21123.056691] <TASK>
[21123.057205] ? __warn+0x83/0x130
[21123.057984] ? enqueue_dl_entity+0x54f/0x5d0
[21123.058989] ? report_bug+0x18e/0x1a0
[21123.059865] ? handle_bug+0x54/0x90
[21123.060689] ? exc_invalid_op+0x18/0x70
[21123.061591] ? asm_exc_invalid_op+0x1a/0x20
[21123.062584] ? enqueue_dl_entity+0x54f/0x5d0
[21123.063337] dl_server_start+0x31/0xe0
[21123.063939] enqueue_task_fair+0x218/0x680
[21123.064604] activate_task+0x21/0x50
[21123.065185] attach_task+0x30/0x50
[21123.065729] sched_balance_rq+0x65e/0xe00
[21123.066377] sched_balance_newidle.constprop.0+0x190/0x360
[21123.067255] pick_next_task_fair+0x2a/0x340
[21123.067921] __schedule+0x10e/0x8b0
[21123.068524] ? queue_delayed_work_on+0x53/0x60
[21123.069238] schedule+0x22/0xd0
[21123.069742] worker_thread+0x1a2/0x3a0
[21123.070347] ? __pfx_worker_thread+0x10/0x10
[21123.071025] kthread+0xd1/0x100
[21123.071536] ? __pfx_kthread+0x10/0x10
[21123.072133] ret_from_fork+0x2f/0x50
[21123.072714] ? __pfx_kthread+0x10/0x10
[21123.073323] ret_from_fork_asm+0x1a/0x30
[21123.073949] </TASK>
[21123.074314] ---[ end trace 0000000000000000 ]---