Re: [PATCH] panic, kexec: Don't mutex_trylock() in __crash_kexec()

From: Valentin Schneider
Date: Fri Jun 17 2022 - 07:52:15 EST


Hi Tao,

On 17/06/22 18:42, Tao Zhou wrote:
> Hi Valentin,
>
> On Thu, Jun 16, 2022 at 01:37:09PM +0100, Valentin Schneider wrote:
>> @@ -964,24 +966,31 @@ late_initcall(kexec_core_sysctl_init);
>> */
>> void __noclone __crash_kexec(struct pt_regs *regs)
>> {
>> - /* Take the kexec_mutex here to prevent sys_kexec_load
>> - * running on one cpu from replacing the crash kernel
>> - * we are using after a panic on a different cpu.
>> + /*
>> + * This should be taking kexec_mutex before doing anything with the
>> + * kexec_crash_image, but this code can be run in NMI context which
>> + * means we can't even trylock.
>> *
>> - * If the crash kernel was not located in a fixed area
>> - * of memory the xchg(&kexec_crash_image) would be
>> - * sufficient. But since I reuse the memory...
>> + * Pairs with smp_mb() in do_kexec_load() and sys_kexec_file_load()
>> */
>> - if (mutex_trylock(&kexec_mutex)) {
>> - if (kexec_crash_image) {
>> - struct pt_regs fixed_regs;
>> -
>> - crash_setup_regs(&fixed_regs, regs);
>> - crash_save_vmcoreinfo();
>> - machine_crash_shutdown(&fixed_regs);
>> - machine_kexec(kexec_crash_image);
>> - }
>> - mutex_unlock(&kexec_mutex);
>> + WRITE_ONCE(panic_wants_kexec, true);
>> + smp_mb();
>> + /*
>> + * If we're panic'ing while someone else is messing with the crash
>> + * kernel, this isn't going to end well.
>> + */
>> + if (READ_ONCE(kexec_loading)) {
>> + WRITE_ONCE(panic_wants_kexec, false);
>> + return;
>> + }
>
> So this is from NMI. The mutex guarantee that kexec_file_load() or
> do_kexec_load() just one of them beat on cpu. NMI can happen on more
> than one cpu. That means that here be cumulativity here IMHO.
>

If you look at __crash_kexec() in isolation yes, but if you look at panic()
and nmi_panic() only a single NMI can get in there. I think that is also
true for invocations via crash_kexec().


> kexec_file_load()/ NMI0 NMI1..
> do_kexec_load()
>
> set kexec_loading=true
> smp_mb() set panic_wants_kexec=ture
> smp_mb()
> see kexec_loading=ture and
> conditionally set
> panic_wants_kexec=false;
> set panic_wants_kexec=ture
> smp_mb()
> see panic_wants_kexec=ture
> conditionally set
> kexec_loading=false
> see kexec_loading=false
> do kexec nmi things.
>
> You see conditionlly set kexec_loading or panic_wants_kexec there no barrier
> there and if the cumulativity to have the effect there should be a acquire-release,
> if I am not wrong.
>
> __crash_kexec():
>
> WRITE_ONCE(panic_wants_kexec, true);
> smp_mb();
> /*
> * If we're panic'ing while someone else is messing with the crash
> * kernel, this isn't going to end well.
> */
> if (READ_ONCE(kexec_loading)) {
> smp_store_release(panic_wants_kexec, false);
> return;
> }
>
> kexec_file_load()/do_kexec_load():
>
> WRITE_ONCE(kexec_loading, true);
> smp_mb();
> if (smp_load_acquire(panic_wants_kexec)) {
> WRITE_ONCE(kexec_loading, false);
> ...
> }
>
> For those input, I'm sure I lost and feel hot..
> I thought that change the patten to load-store and set initial
> value but failed.
>

I'm not sure if further ordering is required here, the base case being

WRITE_ONCE(panic_wants_kexec, true); WRITE_ONCE(kexec_loading);
smp_mb(); smp_mb();
v0 = READ_ONCE(kexec_loading); v1 = READ_ONCE(panic_wants_kexec);

(see SB+fencembonceonces litmus test)

Wich ensures (!v0 && !v1) is never true. If modified to:

WRITE_ONCE(panic_wants_kexec, true); WRITE_ONCE(kexec_loading);
smp_mb(); smp_mb();
v0 = READ_ONCE(kexec_loading); v1 = READ_ONCE(panic_wants_kexec);
if (v0) if (v1)
WRITE_ONCE(panic_wants_kexec, false); WRITE_ONCE(kexec_loading, false);

then "(!v0 && !v1) is never true" still holds, so the exclusivity is
maintained AFAICT.