RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
From: æåèå / KAWAIïHIDEHIRO
Date: Thu Dec 03 2015 - 06:29:30 EST
> On Thu, Dec 03, 2015 at 02:01:38AM +0000, æåèå / KAWAIïHIDEHIRO wrote:
> > > On Wed, Dec 02, 2015 at 11:57:38AM +0000, æåèå / KAWAIïHIDEHIRO wrote:
> > > > We can do so, but I think resetting panic_cpu always would be
> > > > simpler and safer.
> >
> > I'll state in detail.
> >
> > When we call crash_kexec() without entering panic() and return from
> > it, panic() should be called eventually.
>
> Huh, the call chain is
>
> panic->crash_kexec
>
> Or do you mean, when crash_kexec() is not called by panic() but by some
> of its other callers?
I was arguing about the case of oops_end --> crash_kexec
--> return from crash_kexec because of !kexec_crash_image -->
panic.
In the case of panic --> __crash_kexec, __crash_kexec is called
only once, so we don't need to check the return value of __crash_kexec
as you suggested. So I thought you stated about crash_kexec --> panic
case.
> > But the code paths are a bit complicated and there are many
> > implementations for each architecture. So one day, this assumption may
> > be broken; the CPU doesn't call panic(). Or the CPU may fail to call
> > panic() because we are already in insane state. It would be nervous,
> > but allowing another CPU to process panic routines by resetting
> > panic_cpu is safer approach.
>
> My suggestion was to do this only on the panic path - not necessarily on
> the others.
>
> > Since this code is executed only once due to panic_cpu,
> > I think introducing this logic is not much valuable.
> > Also, current implementation is already quite simple:
> >
> > panic()
> > {
> > ...
> > __crash_kexec(NULL) {
> > if (mutex_trylock(&kexec_mutex)) {
> > if (kexec_crash_image) {
> > /* don't return */
> > }
>
> I don't mean the kexec_crash_image case - I mean the opposite one:
> !kexec_crash_image.
I also mentioned !kexec_crash_image case...
> And I think I know now what you're trying to tell
> me: the first CPU which hits panic, will finish panic eventually and so
> it will take down the machine.
No. The first CPU calls panic, and then it calls __crash_kexec.
Because of !kexec_crash_image, it returns from __crash_kexec and
continues to the panic procedure. At the same time, another CPU
tries to call panic(), but it doesn't run the panic procedure;
panic_cpu prevents the second CPU from running it.
This means __crash_kexec is called only once even if we don't
check the return value of __crash_kexec.
(Please note that crash_kexec can be called multiple times in the
case of oops_end() --> crash_kexec().)
I'm sorry I couldn't tell my thought well.
Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group