RE: [EXT] Re: [PATCH] edac: nxp: Add L1 and L2 error detection for A53 and A72 cores
From: Alison Wang
Date: Tue Aug 25 2020 - 23:03:30 EST
Hi, James,
> On 25/08/2020 03:31, Alison Wang wrote:
> >> On 09/07/2020 09:22, Alison Wang wrote:
> >>> Add error detection for A53 and A72 cores. Hardware error injection
> >>> is supported on A53. Software error injection is supported on both.
> >>
> > <snip>
> >>
> >> As we can't safely write to these registers from linux, so I think
> >> this means all the error injection and maybe SMC stuff can disappear.
>
> > I agreed with your opinion that CPUACTLR_EL1 and L2ACTLR can't be written
> in Linux.
>
> Well, we can't do what the TRM tells us we must before writing to that
> register.
[Alison] Right.
>
>
> > So the error injection can't be done in Linux. Do you mean the error
> > injection can only be done in firmware before Linux boots up? If so,
> > the system is running with error injection enabled all the time, it may be not
> a good idea too. Any suggestion?
>
> These registers are expected to have one value, forever. The errata document
> sometimes tells us to to set or clear one of these bits to workaround an issue.
> Because they can only be written to when the system is idle, typically during
> boot, this is firmware's responsibility.
>
> I expect firmware to set the bits in ACTLR_EL3, to prevent lower exception
> levels from touching any of these registers.
>
>
> I don't know how the error injection on A53 or A72 works, so I don't know if
> you can leave it enabled all the time. The bit you are setting is described as
> RES0 by the A53 and A72 TRMs. I suspect I had the wrong TRM open, as my
> 'L1DEIEN' comment seems to be what your CPUACTLR_EL1[6] is called on A35.
> (35, 53? Guess how that happened!)
[Alison] Please check A53 TRM r0p4 from https://developer.arm.com/documentation/ddi0500/j/System-Control/AArch64-register-descriptions/CPU-Auxiliary-Control-Register--EL1?lang=en .
In the CPUACTLR_EL1 bit assignments, you will find the following description.
[6] L1DEIEN
L1 D-cache data RAM error injection enable.
0
Normal behavior, errors are not injected. This is the reset value.
1
Double-bit errors are injected on all writes to the L1 D-cache data RAMs for the first word of each 32-byte region.
>
> A35's error injection says:
> | While this bit is set, double-bit errors are injected on all writes to
> | the L1 D-cache data RAMs for the first word of each 32-byte region.
>
> You certainly can't leave this sort of thing enabled! And you can't change it at
> runtime, so we can't use it.
[Alison] Ok.
>
>
> I think features like this are intended to be used to check the integration, not
> to test the software.
>
>
> After I sent the original comments on this, I found Sascha's version, which has
> these issues resolved:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> nel.org%2Flinux-arm-kernel%2F20200813075721.27981-1-s.hauer%40pengut
> ronix.de%2F&data=02%7C01%7Calison.wang%40nxp.com%7C3dc61602
> 25b24fce068708d848f9557e%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0
> %7C0%7C637339583002064849&sdata=tLf6MHh5IMSBXvASkeaIANLGX
> A0J6F26hpn254a6I6c%3D&reserved=0
>
> I think this version should work on your platform too.
[Alison] I have a look at this patch. This patch doesn't complete all the functions in my patch. It
is just to report errors, but error injection function is all removed.
Best Regards,
Alison Wang