Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

From: Michael Ellerman
Date: Fri Apr 17 2020 - 07:48:52 EST


"Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxx> writes:
> Hi Qian,
>
> Qian Cai wrote:
>> OK, reverted the commit,
>>
>> c55d7b5e6426 (âpowerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLEâ)
>>
>> or set STRICT_KERNEL_RWX=n fixed the crash below and also mentioned in this thread,
>>
>> https://lore.kernel.org/lkml/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@xxxxxx/
>
> Do you see any errors logged in dmesg when you see the crash?
> STRICT_KERNEL_RWX changes how patch_instruction() works, so it would be
> interesting to see if there are any ftrace-related errors thrown before
> the crash.

I've been able to reproduce with STRICT_KERNEL_RWX=y and concurrently
running:

# while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done

and:

# while true; do find /lib/modules/$(uname -r) -name '*.ko' -printf "%f\n" | sed -e "s/\.ko//" | xargs -i modprobe -va {}; lsmod | awk '{print $1}' | xargs -i modprobe -vr {}; done

ie. stressing module loading/unloading and ftrace at the same time.


It's not 100% but it usually reproduces within 10-20 minutes.

It looks like sometimes our __patch_instruction() fails, and then that
somehow leads to things getting further messed up. Presumably we have
some bad error handling somewhere.

cheers