Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort

From: James Morse
Date: Thu Jan 19 2017 - 13:10:13 EST

Next message: Shuah Khan: "Re: [PATCH 4.9 000/120] 4.9.5-stable review"
Previous message: Cathy Avery: "Re: [PATCH 1/2] scsi: scsi_transport_fc: Provide a lightweight option for Virtual FC Hosts."
In reply to: Baicar, Tyler: "Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort"
Next in thread: Baicar, Tyler: "Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Tyler,

On 18/01/17 23:26, Baicar, Tyler wrote:
> On 1/17/2017 3:31 AM, James Morse wrote:
>> On 12/01/17 18:15, Tyler Baicar wrote:
>>> SEA exceptions are often caused by an uncorrected hardware
>>> error, and are handled when data abort and instruction abort
>>> exception classes have specific values for their Fault Status
>>> Code.
>>> When SEA occurs, before killing the process, go through
>>> the handlers registered in the notification list.
>>> Update fault_info[] with specific SEA faults so that the
>>> new SEA handler is used.
>>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr,
>>> struct pt_regs *regs)
>>> return 1;
>>> }
>>> +/*
>>> + * This abort handler deals with Synchronous External Abort.
>>> + * It calls notifiers, and then returns "fault".
>>> + */
>>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>> +{
>>> + struct siginfo info;
>>> +
>>> + atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>>> +
>>> + pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> + fault_name(esr), esr, addr);
>>> +
>>> + info.si_signo = SIGBUS;
>>> + info.si_errno = 0;
>>> + info.si_code = 0;
>> Half of the other do_*() functions in this file read the signo and code from the
>> fault_info table.
>>
>>
>>> + info.si_addr = (void __user *)addr;
>> addr here was read from FAR_EL1, but for some of the classes of exception you
>> have listed below this register isn't updated with the faulting address.
>>
>> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an
>> Exception level that is using Aarch64" has:
>>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External
>>> Aborts other than Synchronous External Aborts on Translation Table Walks. In
>>> this case, the ISS.FnV bit returned in ESR_ELx indicates whether FAR_ELx is
>>> valid.
>> This is a problem if we get 'synchronous external abort' or 'synchronous parity
>> error' while a user space process was running.

> It looks like this would just cause an incorrect address to be printed in the
> above pr_err.
> Unless I'm missing something, I don't see arm64_notify_die or anything that gets
> called from
> there using the info.si_addr variable.

I may be misreading something here...

This patch has:
> info.si_addr = (void __user *)addr;
> arm64_notify_die("", regs, &info, esr);

>From arch/arm64/kernel/traps.c:arm64_notify_die():
> if (user_mode(regs)) {
> current->thread.fault_address = 0;
> current->thread.fault_code = err;
> force_sig_info(info->si_signo, info, current);
> }

So if the SEA interrupted userspace, we put maybe-unknown addr into
force_sig_info() to deliver a signal to user space. User-space then gets a copy
of the info struct containing the maybe-unknown addr.

I think this is an existing bug, but if we are separating the synchronous
external aborts from the generic do_bad handler, we should probably check the
FnV bit. (I think we should still print it out)

> What do you suggest I do here? The firmware should be reporting the physical and
> virtual
> address information if it is available in the HEST entry that the kernel will
> parse.

Its not just firmware that may trigger this, other SoCs may use it for parity or
ECC errors, and they may not always have a valid address in FAR_EL1.

I think we should check the FnV bit in the esr variable and set info.si_addr to
0 if the addr we have isn't valid:
'For some implementations, the value of si_addr may be inaccurate.' [0]

Thanks,

James

[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

Next message: Shuah Khan: "Re: [PATCH 4.9 000/120] 4.9.5-stable review"
Previous message: Cathy Avery: "Re: [PATCH 1/2] scsi: scsi_transport_fc: Provide a lightweight option for Virtual FC Hosts."
In reply to: Baicar, Tyler: "Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort"
Next in thread: Baicar, Tyler: "Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]