Error reports at boot time in Ampere Altra machines since c733ebb7c

From: Aristeu Rozanski
Date: Thu Mar 02 2023 - 15:18:27 EST


Hi Marc,

Since c733ebb7cb67d ("irqchip/gic-v3-its: Reset each ITS's BASERn
register before probe"), Ampere Altra machines are reporting corrected
errors during boot:

[ 0.294334] HEST: Table parsing has been initialized.
[ 0.294397] sdei: SDEIv1.0 (0x0) detected in firmware.
[ 0.299622] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 0.299626] {1}[Hardware Error]: event severity: recoverable
[ 0.299629] {1}[Hardware Error]: Error 0, type: recoverable
[ 0.299633] {1}[Hardware Error]: section type: unknown, e8ed898d-df16-43cc-8ecc-54f060ef157f
[ 0.299638] {1}[Hardware Error]: section length: 0x30
[ 0.299645] {1}[Hardware Error]: 00000000: 00000005 ec30000e 00080110 80001001 ......0.........
[ 0.299648] {1}[Hardware Error]: 00000010: 00000300 00000000 00000000 00000000 ................
[ 0.299650] {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
[ 0.299714] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
[ 0.299716] {2}[Hardware Error]: event severity: recoverable
[ 0.299717] {2}[Hardware Error]: Error 0, type: recoverable
[ 0.299718] {2}[Hardware Error]: section type: unknown, e8ed898d-df16-43cc-8ecc-54f060ef157f
[ 0.299720] {2}[Hardware Error]: section length: 0x30
[ 0.299722] {2}[Hardware Error]: 00000000: 40000005 ec30000e 00080110 80005001 ...@..0......P..
[ 0.299724] {2}[Hardware Error]: 00000010: 00000300 00000000 00000000 00000000 ................
[ 0.299726] {2}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
[ 0.299912] GHES: APEI firmware first mode is enabled by APEI bit.

Because the errors are being reported later in boot, it's hard to
pinpoint exactly what's causing it without decoding the error information,
which I currently don't know how to do it.

There're no problems other than of course triggering tests because of
the warnings.

Do you know what's going on here?

Thanks

--
Aristeu