Re: [PATCH] arm: mm: fault: check ADFSR in case of abort

From: Russell King - ARM Linux
Date: Mon Oct 29 2018 - 12:43:48 EST


On Mon, Oct 29, 2018 at 03:54:36PM +0000, Mark Rutland wrote:
> On Mon, Oct 29, 2018 at 02:20:51PM +0000, Wiebe, Wladislav (Nokia - DE/Ulm) wrote:
> > When running into situations like:
> > "Unhandled fault: synchronous external abort (0x210) at 0xXXX"
> > or
> > "Unhandled prefetch abort: synchronous external abort (0x210) at 0xXXX"
> > it is useful to know the content of ADFSR (Auxiliary Data Fault Status
> > Register) to indicate an ECC double-bit error in L1 or L2 cache.
> >
> > Refer to:
> > Cortex-A15 Technical Reference Manual, Revision: r2p1
> > [6.4.8. Error Correction Code]
> >
> > Signed-off-by: Wladislav Wiebe <wladislav.wiebe@xxxxxxxxx>
> > ---
> > arch/arm/mm/fault.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 3232afb6fdc0..5e240deb6ed6 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -547,6 +547,22 @@ hook_fault_code(int nr, int (*fn)(unsigned long, unsigned int, struct pt_regs *)
> > fsr_info[nr].name = name;
> > }
> >
> > +/*
> > + * Check for ECC double-bit errors in Auxiliary Data Fault Status Register
> > + */
> > +static void check_adfsr_for_ecc(void)
> > +{
> > + u32 adfsr = 0;
> > +
> > + asm("mrc p15, 0, %0, c5, c1, 0" : "=r" (adfsr));
> > +
> > + if (adfsr & (BIT(31) | BIT(23))) {
> > + pr_alert("ADFSR status 0x%x indicates that an L1 or L2 cache\n"
> > + "ECC double-bit error occurred at some time.\n",
> > + adfsr);
> > + }
> > +}
> > +
> > /*
> > * Dispatch a data abort to the relevant handler.
> > */
> > @@ -559,6 +575,7 @@ do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> > if (!inf->fn(addr, fsr & ~FSR_LNX_PF, regs))
> > return;
> >
> > + check_adfsr_for_ecc();
> > pr_alert("Unhandled fault: %s (0x%03x) at 0x%08lx\n",
> > inf->name, fsr, addr);
> > show_pte(current->mm, addr);
> > @@ -593,6 +610,7 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
> > if (!inf->fn(addr, ifsr | FSR_LNX_PF, regs))
> > return;
> >
> > + check_adfsr_for_ecc();
> > pr_alert("Unhandled prefetch abort: %s (0x%03x) at 0x%08lx\n",
> > inf->name, ifsr, addr);
>
> IIUC at this point the task is preemptible (and interruptible),

It may be preemptable, but isn't necessarily so. It depends whether the
called FSR specific function enabled interrupts or not.

So, it would be better to read the ADFSR before calling the FSR specific
function to guarantee that we read the values that correspond with _this_
fault.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up