Re: [PATCH] arm64: panic on synchronous external abort in kernel context

From: Xie XiuQi
Date: Thu Apr 30 2020 - 05:44:15 EST


Hi James,

Sorry for reply late.

On 2020/4/14 22:53, James Morse wrote:
> Hi Xie,
>
> On 14/04/2020 13:39, Xie XiuQi wrote:
>> On 2020/4/14 20:16, James Morse wrote:
>>> On 14/04/2020 11:59, Mark Rutland wrote:
>>>> On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
>>>>> We should panic even panic_on_oops is not set, when we can't recover
>>>>> from synchronous external abort in kernel context.
>>>
>>> Hmm, fault-from-kernel-context doesn't mean the fault affects the kernel. If the kernel is
>>> reading or writing from user-space memory for a syscall, its the user-space memory that is
>>> affected. This thread can't make progress, so we kill it.
>>> If its a kernel thread or we were in irq context, we panic().
>>>
>>> I don't think you really want all faults that happen as a result of a kernel access to be
>>> fatal!
>
>> Yes, you're right. I just want to fix a hung up when ras error inject testing.
>>
>> panic_on_oops is not set in the kernel image for testing. When receiving a sea in kernel
>> context, the PE trap in do_exit(), and can't return any more.
>
> trap? gets trapped, (or gets stuck, to prevent confusion with the architectures use of the
> word 'trap'!)
>
>
>> I analyze the source code, the call trace might like this:
>> do_mem_abort
>> -> arm64_notify_die
>> -> die # kernel context, call die() directly;
>> -> do_exit # kernel process context, call do_exit(SIGSEGV);
>> -> do_task_dead() # call do_task_dead(), and hung up this core;
>
> Thanks for the trace. This describes a corrected error in your I-cache, that occurred
> while the kernel was executing a kernel thread. These shouldn't be fatal, because it was
> corrected ... but the kernel doesn't know that because it doesn't know how to parse those
> records.
>
> There are two things wrong here:
> 1. it locks up while trying to kill the thread.
> 2. it tried to kill the thread in the first place!
>
> For 1, does your l1l2_inject module take any spinlocks or tinker with the pre-empt counter?
>
> I suspect this is some rarely-tested path in do_task_dead() that sleeps, but can't from
> your l1l2_inject module because the pre-empt counter has been raised.
>
> CONFIG_DEBUG_ATOMIC_SLEEP may point at the function to blame.
>
> It may be accessing some thread data that kthreads don't have, taking a second exception
> and blocking on the die_lock. LOCKDEP should catch this one.
>
> We should fix this one first.
>

I analyze the l1l2_inject module, there is a kworker thread used to inject error.
do_sea() try to kill the kworker thread, so the work(s) queued on this kworker
is blocked.

panic_on_oops option is usually set on production environment, so if someone
unset this option for testing, he should take his own risk. Is it right?

>
> 2 is a bit more complicated. Today, this is fatal because the arch code assumes this was
> probably a memory error, and if it returns to user-space it can't avoid getting stuck in a
> loop until the scheduled memory_failure() work runs. Instead it unconditionally signals
> the process.
>
> [0] fixes this up for memory errors. But in this case it will assume all the work has been
> done by APEI, (or will be before we get back to user-space). APEI ignored the processor
> error you fed it, because it doesn't know what they are, they are just printed out.
>
> This is fine for corrected errors, but were are reliant on your firmware describing
> uncorrected errors with a 'fatal' severity... which might be too heavy a hammer. (Ideally
> that would mean 'uncontained', and the kernel should handle, or detect it can't, any other
> errror...)
>
> We can discuss the right thing to do here when support for parsing these 'arm processor
> errors' is posted.
> (If you think I need to do something different in [0] because of this, please shout!)

For some errors which could be recoverable or corrected, do_sea() should not kill process
or die() unconditionally. It's better detect this situation.

>
>
>> [ 387.740609] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>> [ 387.748837] {1}[Hardware Error]: event severity: recoverable
>> [ 387.754470] {1}[Hardware Error]: Error 0, type: recoverable
>
>> [ 387.760103] {1}[Hardware Error]: section_type: ARM processor error
>
> et voila! Case 2. Linux doesn't handle these 'ARM processor error' things, because it
> doesn't know what they are. It just prints them out.
>
>
>> [ 387.766425] {1}[Hardware Error]: MIDR: 0x00000000481fd010
>> [ 387.771972] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000081080000
>> [ 387.780628] {1}[Hardware Error]: error affinity level: 0
>> [ 387.786088] {1}[Hardware Error]: running state: 0x1
>> [ 387.791115] {1}[Hardware Error]: Power State Coordination Interface state: 0
>> [ 387.798301] {1}[Hardware Error]: Error info structure 0:
>> [ 387.803761] {1}[Hardware Error]: num errors: 1
>> [ 387.808356] {1}[Hardware Error]: error_type: 0, cache error
>> [ 387.814160] {1}[Hardware Error]: error_info: 0x0000000024400017
>> [ 387.820311] {1}[Hardware Error]: transaction type: Instruction
>> [ 387.826461] {1}[Hardware Error]: operation type: Generic error (type cannot be determined)
>> [ 387.835031] {1}[Hardware Error]: cache level: 1
>
>> [ 387.839878] {1}[Hardware Error]: the error has been corrected
>
> As this is corrected, I think the bug is a deadlock somewhere in do_task_dead().
>
>
>> [ 387.845942] {1}[Hardware Error]: physical fault address: 0x00000027caf50770
>
> (and your firmware gives you the physical address, excellent, the kernel can do something
> with this!)
>
>
>> [ 388.021037] Call trace:
>> [ 388.023475] lsu_inj_ue+0x58/0x70 [l1l2_inject]
>> [ 388.029019] error_inject+0x64/0xb0 [l1l2_inject]
>> [ 388.033707] process_one_work+0x158/0x4b8
>> [ 388.037699] worker_thread+0x50/0x498
>> [ 388.041348] kthread+0xfc/0x128
>> [ 388.044480] ret_from_fork+0x10/0x1c
>> [ 388.048042] Code: b2790000 d519f780 f9800020 d5033f9f (58001001)
>> [ 388.054109] ---[ end trace 39d51c21b0e42ba6 ]---
>>
>> core 0 hung up at here.
>
> DEBUG_ATOMIC_SLEEP and maybe LOCKDEP should help you pin down where the kernel is getting
> stuck. It looks like a bug in the core code.
>
>
> Thanks,
>
> James
>
> [0] https://lore.kernel.org/linux-acpi/20200228174817.74278-4-james.morse@xxxxxxx/
> .
>