Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

From: Michael Ellerman
Date: Wed Apr 08 2020 - 08:21:24 EST


Leonardo Bras <leonardo@xxxxxxxxxxxxx> writes:
> Hello Nick, Michael,
>
> On Fri, 2020-04-03 at 16:41 +1000, Nicholas Piggin wrote:
> [...]
>> > > PAPR says we are not allowed to have multiple CPUs calling RTAS at once,
>> > > except for a very small list of RTAS calls. So if we bust the RTAS lock
>> > > there's a risk we violate that part of PAPR and crash even harder.
>> >
>> > Interesting, I was not aware.
>> >
>> > > Also it's not specific to kdump, we can't even get through a normal
>> > > reboot if we crash with the RTAS lock held.
>> > >
>> > > Anyway here's a patch with some ideas. That allows me to get from a
>> > > crash with the RTAS lock held through kdump into the 2nd kernel. But it
>> > > only works if it's the crashing CPU that holds the RTAS lock.
>> > >
>> >
>> > Nice idea.
>> > But my test environment is just triggering a crash from sysrq, so I
>> > think it would not improve the result, given that this thread is
>> > probably not holding the lock by the time.
>>
>> Crash paths should not take that RTAS lock, it's a massive pain. I'm
>> fixing it for machine check, for other crashes I think it can be removed
>> too, it just needs to be unpicked. The good thing with crashing is that
>> you can reasonably *know* that you're single threaded, so you can
>> usually reason through situations like above.
>>
>> > I noticed that when rtas is locked, irqs and preemption are also
>> > disabled.
>> >
>> > Should the IPI send by crash be able to interrupt a thread with
>> > disabled irqs?
>>
>> Yes. It's been a bit painful, but in the long term it means that a CPU
>> which hangs with interrupts off can be debugged, and it means we can
>> take it offline to crash without risking that it will be clobbering what
>> we're doing.
>>
>> Arguably what I should have done is try a regular IPI first, wait a few
>> seconds, then NMI IPI.
>>
>> A couple of problems with that. Firstly it probably avoids this issue
>> you hit almost all the time, so it won't get fixed. So when we really
>> need the NMI IPI in the field, it'll still be riddled with deadlocks.
>>
>> Secondly, sending the IPI first in theory can be more intrusive to the
>> state that we want to debug. It uses the currently running stack, paca
>> save areas, ec. NMI IPI uses its own stack and save regions so it's a
>> little more isolated. Maybe this is only a small advantage but I'd like
>> to have it if we can.
>
> I think the printk issue is solved (sent a patch on that), now what is
> missing is the rtas call spinlock.
>
> I noticed that rtas.lock is taken on machine_kexec_mask_interrupts(),
> which happen after crashing the other threads and getting into
> realmode.
>
> The following rtas are called each IRQ with valid interrupt descriptor:
> ibm,int-off : Reset mask bit for that interrupt
> ibm,set_xive : Set XIVE priority to 0xff
>
> By what I could understand, these rtas calls happen to put the next
> kexec kernel (kdump kernel) in a safer environment, so I think it's not
> safe to just remove them.

Yes.

> (See commit d6c1a9081080c6c4658acf2a06d851feb2855933)

In hindsight the person who wrote that commit was being lazy. We
*should* have made the 2nd kernel robust against the IRQ state being
messed up.

> On the other hand, busting the rtas.lock could be dangerous, because
> it's code we can't control.
>
> According with LoPAR, for both of these rtas-calls, we have:
>
> For the PowerPC External Interrupt option: The call must be reentrant
> to the number of processors on the platform.
> For the PowerPC External Interrupt option: The argument call buffer for
> each simultaneous call must be physically unique.

Oh well spotted. Where is that in the doc?

> Which I think means this rtas-calls can be done simultaneously.

I think so too. I'll read PAPR in the morning and make sure.

> Would it mean that busting the rtas.lock for these calls would be safe?

What would be better is to make those specific calls not take the global
RTAS lock to begin with.

We should be able to just allocate the rtas_args on the stack, it's only
~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
take the global lock.

cheers