Re: repeating [Hardware Error]: Corrected error, no action required.

From: Oleksandr Natalenko
Date: Thu Jun 10 2021 - 04:40:46 EST


Hello.

On Wed, Jun 09, 2021 at 08:27:26PM +0200, Toralf Förster wrote:
> My syslog messages show at a hardened Gentoo
>
> # uname -a
> Linux mr-fox 5.12.9 #8 SMP Thu Jun 3 17:59:32 CEST 2021 x86_64 AMD Ryzen
> 9 5950X 16-Core Processor AuthenticAMD GNU/Linux
> mr-fox ~ #
>
> repeating entries every 5 mins like (always same address
> 0x000000031fb566e0):
>
> Jun 9 16:21:24 mr-fox kernel: mce: [Hardware Error]: Machine check
> events logged
> Jun 9 16:21:24 mr-fox kernel: [Hardware Error]: Corrected error, no
> action required.
> Jun 9 16:21:24 mr-fox kernel: [Hardware Error]: CPU:0 (19:21:0)
> MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
> Jun 9 16:21:24 mr-fox kernel: [Hardware Error]: Error Addr:
> 0x000000031fb566e0
> Jun 9 16:21:24 mr-fox kernel: [Hardware Error]: IPID:
> 0x0000009600050f00, Syndrome: 0x33fa01000a800101
> Jun 9 16:21:24 mr-fox kernel: [Hardware Error]: Unified Memory
> Controller Ext. Error Code: 0, DRAM ECC error.
> Jun 9 16:21:24 mr-fox kernel: EDAC MC0: 1 CE on mc#0csrow#1channel#0
> (csrow:1 channel:0 page:0xcaed59 offset:0x8e0 grain:64 syndrome:0x100)
> Jun 9 16:21:24 mr-fox kernel: [Hardware Error]: cache level: L3/GEN,
> tx: GEN, mem-tx: RD
>
>
> A hw mem check by Hetzner didn't found anything.

Did they run memtest in a loop for 10 times at least?

> May I asked whether I sahll worry about or not ?

If the reported page is indeed the same, then probably yes, you should
worry.

--
Oleksandr Natalenko (post-factum)