RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

From: æåèå / KAWAIïHIDEHIRO
Date: Mon Aug 31 2015 - 04:53:21 EST


Hello Peter,

> From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of æåèå / KAWAIï
>
> Hi,
>
> > From: Peter Zijlstra [mailto:peterz@xxxxxxxxxxxxx]
> >
> > On Sat, Aug 22, 2015 at 02:35:24AM +0000, æåèå / KAWAIïHIDEHIRO wrote:
> > > > From: Peter Zijlstra [mailto:peterz@xxxxxxxxxxxxx]
> > > >
> > > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > > > > void crash_kexec(struct pt_regs *regs)
> > > > > {
> > > > > + int old_cpu, this_cpu;
> > > > > +
> > > > > + /*
> > > > > + * `old_cpu == -1' means we are the first comer and crash_kexec()
> > > > > + * was called without entering panic().
> > > > > + * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> > > > > + */
> > > > > + this_cpu = raw_smp_processor_id();
> > > > > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > > > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > > > + return;
> > > >
> > > > This allows recursive calling of crash_kexec(), the Changelog did not
> > > > mention that. Is this really required?
> > >
> > > What part are you arguing? Recursive call of crash_kexec() doesn't
> > > happen. In the first place, one of the purpose of this patch is
> > > to prevent a recursive call of crash_kexec() in the following case
> > > as I stated in the description:
> > >
> > > CPU 0:
> > > oops_end()
> > > crash_kexec()
> > > mutex_trylock() // acquired
> > > <NMI>
> > > io_check_error()
> > > panic()
> > > crash_kexec()
> > > mutex_trylock() // failed to acquire
> > > infinite loop
> > >
> >
> > Yes, but what to we want to do there? It seems to me that is wrong, we
> > do not want to let a recursive crash_kexec() proceed.
> >
> > Whereas the condition you created explicitly allows this recursion by
> > virtue of the 'old_cpu != this_cpu' check.
>
> I understand your question. I don't intend to permit the recursive
> call of crash_kexec() as for 'old_cpu != this_cpu' check. That is
> needed for the case of panic() --> crash_kexec(). Since panic_cpu has
> already been set to this_cpu in panic() (please see PATCH 1/4), no one
> can run crash_kexec() without 'old_cpu != this_cpu' check.
>
> If you don't like this check, I would also be able to handle this case
> like below:
>
> crash_kexec()
> {
> old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> if (old_cpu != -1)
> return;
>
> __crash_kexec();
> }
>
> panic()
> {
> atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> __crash_kexec();
> ...
>

Is that OK?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group