Re: [PATCH 1/2] drivers/edac: Add L1 and L2 error detection for A53, A57 and A72

From: Vijay Balakrishna
Date: Thu Apr 10 2025 - 18:27:26 EST


On 4/10/2025 1:04 PM, Tyler Hicks (Microsoft) wrote:
+ snprintf(msg, MESSAGE_SIZE, "%s %s error(s) on CPU %d",
+ str, fatal ? "fatal" : "correctable", cpu);
+
+ if (fatal)
+ edac_device_handle_ue(edac_ctl, cpu, 0, msg);
+ else
+ edac_device_handle_ce(edac_ctl, cpu, 0, msg);
+
+ write_sysreg_s(0, SYS_CPUMERRSR_EL1);
+ isb();
I think the register writes and barriers should happen much closer to the
register reads, in read_errors(). Looking back at Marc's feedback on v5, I
think his most important piece of feedback was to only clear the register when
the valid bit is set to avoid accidentally clobbering an error that came in
between the register read and write.

By moving the register writes into report_errors() in this v6 series, there's
now a much larger window where new errors could occur between the register
read and the register write. Those new errors would be silently lost/ignored.
Reducing the window to the least number of cycles seems important for accurate
reporting.

Moving clear to report_errors() is wrong actually, it has be cleared from the CPU, from read_errors() which is a SMP call. Let me share new version with changes.

Thanks,
Vijay