Re: [PATCH 1/2] sched/idle: Fix arch_cpu_idle() vs tracing

From: Peter Zijlstra
Date: Tue Dec 01 2020 - 09:52:24 EST


On Tue, Dec 01, 2020 at 12:56:27PM +0100, Sven Schnelle wrote:
> Hi Peter,
>
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>
> > On Mon, Nov 30, 2020 at 01:00:03PM -0800, Guenter Roeck wrote:
> >> On Fri, Nov 20, 2020 at 12:41:46PM +0100, Peter Zijlstra wrote:
> >> > We call arch_cpu_idle() with RCU disabled, but then use
> >> > local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
> >> >
> >> > Switch all arch_cpu_idle() implementations to use
> >> > raw_local_irq_{en,dis}able() and carefully manage the
> >> > lockdep,rcu,tracing state like we do in entry.
> >> >
> >> > (XXX: we really should change arch_cpu_idle() to not return with
> >> > interrupts enabled)
> >> >
> >>
> >> Has this patch been tested on s390 ? Reason for asking is that it causes
> >> all my s390 emulations to crash. Reverting it fixes the problem.
> >
> > My understanding is that it changes the error on s390. Previously it
> > would complain about the local_irq_enable() in arch_cpu_idle(), now it
> > complains when taking an interrupt during idle.
>
> I looked into adding the required functionality for s390, but the code
> we would need to add to entry.S is rather large - as you noted we would
> have to duplicate large portions of irqentry_enter() into our code.
> Given that s390 was fine before that patch, can you revert it and submit
> it again during the next merge window?

So the thing that got me started here was:

https://lkml.kernel.org/r/yt9dimbm79qi.fsf@xxxxxxxxxxxxx/

And I got a very similar report from Mark for arm64. I'm not sure what
you meanwhile did to get rid of that. But I'm struggling to understand
how s390 can work on v5.10-rc5.

There's just too much calling into tracing while RCU is stopped.