Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

From: Paul E. McKenney
Date: Fri Jul 29 2022 - 11:26:30 EST


On Fri, Jul 29, 2022 at 03:24:58AM -0700, Michel Lespinasse wrote:
> On Thu, Jul 28, 2022 at 10:20:53AM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 25, 2022 at 12:43:06PM -0700, Michel Lespinasse wrote:
> > > On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
> > > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > > Xeons") wrecked intel_idle in two ways:
> > > >
> > > > - must not have tracing in idle functions
> > > > - must return with IRQs disabled
> > > >
> > > > Additionally, it added a branch for no good reason.
> > > >
> > > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > >
> > > After this change was introduced, I am seeing "WARNING: suspicious RCU
> > > usage" when booting a kernel with debug options compiled in. Please
> > > see the attached dmesg output. The issue starts with commit 32d4fd5751ea
> > > and is still present in v5.19-rc8.
> > >
> > > I'm not sure, is this too late to fix or revert in v5.19 final ?
> >
> > I finally got a chance to take a quick look at this.
> >
> > The rcu_eqs_exit() function is making a lockdep complaint about
> > being invoked with interrupts enabled. This function is called from
> > rcu_idle_exit(), which is an expected code path from cpuidle_enter_state()
> > via its call to rcu_idle_exit(). Except that rcu_idle_exit() disables
> > interrupts before invoking rcu_eqs_exit().
> >
> > The only other call to rcu_idle_exit() does not disable interrupts,
> > but it is via rcu_user_exit(), which would be a very odd choice for
> > cpuidle_enter_state().
> >
> > It seems unlikely, but it might be that it is the use of local_irq_save()
> > instead of raw_local_irq_save() within rcu_idle_exit() that is causing
> > the trouble. If this is the case, then the commit shown below would
> > help. Note that this commit removes the warning from lockdep, so it
> > is necessary to build the kernel with CONFIG_RCU_EQS_DEBUG=y to enable
> > equivalent debugging.
> >
> > Could you please try your test with the -rce commit shown below applied?
>
> Thanks for looking into it.

And thank you for trying this shot in the dark!

> After checking out Peter's commit 32d4fd5751ea,
> cherry picking your commit ed4ae5eff4b3,
> and setting CONFIG_RCU_EQS_DEBUG=y in addition of my usual debug config,
> I am now seeing this a few seconds into the boot:
>
> [ 3.010650] ------------[ cut here ]------------
> [ 3.010651] WARNING: CPU: 0 PID: 0 at kernel/sched/clock.c:397 sched_clock_tick+0x27/0x60

And this is again a complaint about interrupts not being disabled.

But it does appear that the problem was the lockdep complaint, and
eliminating that did take care of part of the problem. But lockdep
remained enabled, and you therefore hit the next complaint.

> [ 3.010657] Modules linked in:
> [ 3.010660] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-test-00005-g1be22fea0611 #1
> [ 3.010662] Hardware name: LENOVO 30BFS44D00/1036, BIOS S03KT51A 01/17/2022
> [ 3.010663] RIP: 0010:sched_clock_tick+0x27/0x60

The most straightforward way to get to sched_clock_tick() from
cpuidle_enter_state() is via the call to sched_clock_idle_wakeup_event().

Except that it disables interrupts before invoking sched_clock_tick().

> [ 3.010665] Code: 1f 40 00 53 eb 02 5b c3 66 90 8b 05 2f c3 40 01 85 c0 74 18 65 8b 05 60 88 8f 4e 85 c0 75 0d 65 8b 05 a9 85 8f 4e 85 c0 74 02 <0f> 0b e8 e2 6c 89 00 48 c7 c3 40 d5 02 00
> 89 c0 48 03 1c c5 c0 98
> [ 3.010667] RSP: 0000:ffffffffb2803e28 EFLAGS: 00010002
> [ 3.010670] RAX: 0000000000000001 RBX: ffffc8ce7fa07060 RCX: 0000000000000001
> [ 3.010671] RDX: 0000000000000000 RSI: ffffffffb268dd21 RDI: ffffffffb269ab13
> [ 3.010673] RBP: 0000000000000001 R08: ffffffffffc300d5 R09: 000000000002be80
> [ 3.010674] R10: 000003625b53183a R11: ffffa012b802b7a4 R12: ffffffffb2aa9e80
> [ 3.010675] R13: ffffffffb2aa9e00 R14: 0000000000000001 R15: 0000000000000000
> [ 3.010677] FS: 0000000000000000(0000) GS:ffffa012b8000000(0000) knlGS:0000000000000000
> [ 3.010678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3.010680] CR2: ffffa012f81ff000 CR3: 0000000c99612001 CR4: 00000000003706f0
> [ 3.010681] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3.010682] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3.010683] Call Trace:
> [ 3.010685] <TASK>
> [ 3.010688] cpuidle_enter_state+0xb7/0x4b0
> [ 3.010694] cpuidle_enter+0x29/0x40
> [ 3.010697] do_idle+0x1d4/0x210
> [ 3.010702] cpu_startup_entry+0x19/0x20
> [ 3.010704] rest_init+0x117/0x1a0
> [ 3.010708] arch_call_rest_init+0xa/0x10
> [ 3.010711] start_kernel+0x6d8/0x6ff
> [ 3.010716] secondary_startup_64_no_verify+0xce/0xdb
> [ 3.010728] </TASK>
> [ 3.010729] irq event stamp: 44179
> [ 3.010730] hardirqs last enabled at (44179): [<ffffffffb2000ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [ 3.010734] hardirqs last disabled at (44177): [<ffffffffb22003f0>] __do_softirq+0x3f0/0x498
> [ 3.010736] softirqs last enabled at (44178): [<ffffffffb2200332>] __do_softirq+0x332/0x498
> [ 3.010738] softirqs last disabled at (44171): [<ffffffffb16c760b>] irq_exit_rcu+0xab/0xf0
> [ 3.010741] ---[ end trace 0000000000000000 ]---

Would you be willing to try another shot in the dark, but untested
this time? I freely admit that this is getting strange.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index e374c0c923dae..279f557bf60bb 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -394,7 +394,7 @@ notrace void sched_clock_tick(void)
if (!static_branch_likely(&sched_clock_running))
return;

- lockdep_assert_irqs_disabled();
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());

scd = this_scd();
__scd_stamp(scd);