On Wed, Apr 12, 2023 at 05:11:26PM +0200, Paul Menzel wrote:
On a Dell PowerEdge R7525 with AMD EPYC 7763 64-Core Processor, Linux
5.15.94 logs the machine check exceptions (MCE) below:
```
[5154053.127240] mce: [Hardware Error]: Machine check events logged
[5154053.133711] mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 17: d42040000000011b
[5154053.141948] mce: [Hardware Error]: TSC 0 ADDR b3cbdbbc0 PPIN 2b615bef7f48098 SYND 6bd210000a801002 IPID 9600650f00
Build the latest kernel with CONFIG_X86_MCE_INJECT and
CONFIG_EDAC_DECODE_MCE enabled and CONFIG_RAS_CEC *disabled*. Then boot
it on that machine with and do the following below.
The files are in debugfs:
/sys/kernel/debug/mce-inject/
├── addr
├── bank
├── cpu
├── flags
├── ipid
├── misc
├── README
├── status
└── synd
so you go and do
echo 0xd42040000000011b > status
echo 0xb3cbdbbc0 > addr
echo 3 > cpu
echo "sw" > flags
echo 0x6bd210000a801002 > synd
echo 0x9600650f00 > ipid
echo 17 > bank
Remember to keep the bank write last because this one injects the error.
It should dump the decoded error in dmesg.