Re: spurious (?) mce Hardware Error messages in v6.19
From: Bert Karwatzki
Date: Thu Feb 12 2026 - 07:54:16 EST
I couldn't test this patch as I was busy figuring out this:
243b467dea17 Revert "drm/amd: Check if ASPM is enabled from PCIe subsystem"
but with this done I could do some testing on v6.19. The periodic bogus mce
errors are gone because smca_should_log_poll_error() usually returns false, but
I still get some error messages for which I'm not sure if they are real errors.
I monitored smca_should_log_poll_error() like this (in v6.19 (errors do not occur in v6.18)):
static bool smca_should_log_poll_error(struct mce *m)
{
if (m->status & MCI_STATUS_VAL) {
printk(KERN_INFO "%s: 0\n", __func__);
return true;
}
m->status = mce_rdmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank));
if ((m->status & MCI_STATUS_VAL) && (m->status & MCI_STATUS_DEFERRED)) {
printk(KERN_INFO "%s: 1\n", __func__);
m->kflags |= MCE_CHECK_DFR_REGS;
return true;
}
printk(KERN_INFO "%s: 2\n", __func__);
return false;
}
And get these error messages (usually just once or twice per boot)
Examples from v6.19:
$ grep -aE "Hardware Error|smca_should_log_poll_error: 1" /var/log/kern.log
2026-02-10T16:15:01.001203+01:00 lisa kernel: [ C0] smca_should_log_poll_error: 1
2026-02-10T16:15:01.001815+01:00 lisa kernel: [T45426] mce: [Hardware Error]: Machine check events logged
2026-02-10T16:15:01.001818+01:00 lisa kernel: [T45426] [Hardware Error]: Deferred error, no action required.
2026-02-10T16:15:01.001819+01:00 lisa kernel: [T45426] [Hardware Error]: CPU:0 (19:50:0) MC14_STATUS[-|-|-|AddrV|PCC|-|-|Deferred|-|-]: 0x8700900800000000
2026-02-10T16:15:01.001821+01:00 lisa kernel: [T45426] [Hardware Error]: Error Addr: 0x01b3877c00000020
2026-02-10T16:15:01.001822+01:00 lisa kernel: [T45426] [Hardware Error]: IPID: 0x000700b040000000
2026-02-10T16:15:01.001831+01:00 lisa kernel: [T45426] [Hardware Error]: L3 Cache Ext. Error Code: 0
2026-02-10T16:15:01.001832+01:00 lisa kernel: [T45426] [Hardware Error]: cache level: RESV, tx: INSN
2026-02-11T14:24:13.358353+01:00 lisa kernel: [ C0] smca_should_log_poll_error: 1
2026-02-11T14:24:13.358832+01:00 lisa kernel: [T310371] mce: [Hardware Error]: Machine check events logged
2026-02-11T14:24:13.361773+01:00 lisa kernel: [T310371] [Hardware Error]: Deferred error, no action required.
2026-02-11T14:24:13.361778+01:00 lisa kernel: [T310371] [Hardware Error]: CPU:0 (19:50:0) MC11_STATUS[-|-|-|AddrV|-|-|SyndV|UECC|Deferred|-|-]:
0x8424b0c8009d011e
2026-02-11T14:24:13.361781+01:00 lisa kernel: [T310371] [Hardware Error]: Error Addr: 0x01f8a43400000020
2026-02-11T14:24:13.361782+01:00 lisa kernel: [T310371] [Hardware Error]: IPID: 0x000700b040000000, Syndrome: 0x0000000000000042
2026-02-11T14:24:13.361787+01:00 lisa kernel: [T310371] [Hardware Error]: L3 Cache Ext. Error Code: 29
2026-02-11T14:24:13.361788+01:00 lisa kernel: [T310371] [Hardware Error]: cache level: L2, tx: RESV, mem-tx: RD
2026-02-12T10:07:28.804529+01:00 lisa kernel: [ C0] smca_should_log_poll_error: 1
2026-02-12T10:07:28.805020+01:00 lisa kernel: [T393396] mce: [Hardware Error]: Machine check events logged
2026-02-12T10:07:28.805028+01:00 lisa kernel: [T393396] [Hardware Error]: Deferred error, no action required.
2026-02-12T10:07:28.805029+01:00 lisa kernel: [T393396] [Hardware Error]: CPU:0 (19:50:0) MC11_STATUS[-|-|-|AddrV|PCC|-|-|Deferred|-|-]: 0x8700900800000000
2026-02-12T10:07:28.805030+01:00 lisa kernel: [T393396] [Hardware Error]: Error Addr: 0x01300a9d00000020
2026-02-12T10:07:28.805031+01:00 lisa kernel: [T393396] [Hardware Error]: IPID: 0x000700b040000000
2026-02-12T10:07:28.805033+01:00 lisa kernel: [T393396] [Hardware Error]: L3 Cache Ext. Error Code: 0
2026-02-12T10:07:28.805034+01:00 lisa kernel: [T393396] [Hardware Error]: cache level: RESV, tx: INSN
Are the "Error Addr" reported here supposed to be physical addresses of memory?
If they are they don't seem to make sense to me given the following output of
"cat /proc/iomem":
Memory in my machine:
# cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009ffff : System RAM
000a0000-000fffff : Reserved
000a0000-000dffff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-09bfefff : System RAM
09bff000-0a000fff : Reserved
0a001000-0a1fffff : System RAM
0a200000-0a20efff : ACPI Non-volatile Storage
0a20f000-e6057fff : System RAM
e6058000-e614bfff : Reserved
e614c000-e868afff : System RAM
e868b000-e868bfff : Reserved
e868c000-e9cdefff : System RAM
e9cdf000-eb1fdfff : Reserved
eb1dd000-eb1e0fff : MSFT0101:00
eb1e1000-eb1e4fff : MSFT0101:00
eb1fe000-eb25dfff : ACPI Tables
eb25e000-eb555fff : ACPI Non-volatile Storage
eb556000-ed1fefff : Reserved
ed1ff000-edffffff : System RAM
ee000000-efffffff : Reserved
f0000000-fcffffff : PCI Bus 0000:00
f0000000-f7ffffff : PCI ECAM 0000 [bus 00-7f]
f0000000-f7ffffff : pnp 00:00
fc500000-fc9fffff : PCI Bus 0000:08
fc500000-fc5fffff : 0000:08:00.7
fc500000-fc5fffff : pcie_mp2_amd
fc600000-fc6fffff : 0000:08:00.4
fc600000-fc6fffff : xhci-hcd
fc700000-fc7fffff : 0000:08:00.3
fc700000-fc7fffff : xhci-hcd
fc800000-fc8fffff : 0000:08:00.2
fc800000-fc8fffff : ccp
fc900000-fc97ffff : 0000:08:00.0
fc980000-fc9bffff : 0000:08:00.5
fc980000-fc9bffff : AMD ACP3x audio
fc980000-fc990200 : acp_pdm_iomem
fc9c0000-fc9c7fff : 0000:08:00.6
fc9c0000-fc9c7fff : ICH HD audio
fc9c8000-fc9cbfff : 0000:08:00.1
fc9c8000-fc9cbfff : ICH HD audio
fc9cc000-fc9cdfff : 0000:08:00.7
fc9ce000-fc9cffff : 0000:08:00.2
fc9ce000-fc9cffff : ccp
fca00000-fccfffff : PCI Bus 0000:01
fca00000-fcbfffff : PCI Bus 0000:02
fca00000-fcbfffff : PCI Bus 0000:03
fca00000-fcafffff : 0000:03:00.0
fcb00000-fcb1ffff : 0000:03:00.0
fcb20000-fcb23fff : 0000:03:00.1
fcb20000-fcb23fff : ICH HD audio
fcc00000-fcc03fff : 0000:01:00.0
fcd00000-fcdfffff : PCI Bus 0000:07
fcd00000-fcd03fff : 0000:07:00.0
fcd00000-fcd03fff : nvme
fce00000-fcefffff : PCI Bus 0000:06
fce00000-fce03fff : 0000:06:00.0
fce00000-fce03fff : nvme
fcf00000-fcffffff : PCI Bus 0000:05
fcf00000-fcf03fff : 0000:05:00.0
fcf04000-fcf04fff : 0000:05:00.0
fcf04000-fcf04fff : r8169
fd300000-fd37ffff : amd_iommu
fec00000-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec10000-fec10fff : Reserved
fec10000-fec10fff : pnp 00:04
fed00000-fed00fff : Reserved
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed40000-fed44fff : Reserved
fed80000-fed8ffff : Reserved
fed81200-fed812ff : AMDI0030:00
fed81500-fed818ff : AMDI0030:00
fed81500-fed818ff : AMDI0030:00 AMDI0030:00
fedc0000-fedc0fff : pnp 00:04
fedc4000-fedc9fff : Reserved
fedc5000-fedc5fff : AMDI0010:03
fedc5000-fedc5fff : AMDI0010:03 AMDI0010:03
fedcc000-fedcefff : Reserved
fedd5000-fedd5fff : Reserved
fee00000-fee00fff : pnp 00:04
ff000000-ffffffff : pnp 00:04
100000000-3ee2fffff : System RAM
30d800000-30e3a3d47 : Kernel code
30e400000-30e81efff : Kernel rodata
30ea00000-30eb108ff : Kernel data
30f00e000-30f1fffff : Kernel bss
3ee300000-40fffffff : Reserved
410000000-ffffffffff : PCI Bus 0000:00
fc00000000-fe0fffffff : PCI Bus 0000:01
fc00000000-fe0fffffff : PCI Bus 0000:02
fc00000000-fe0fffffff : PCI Bus 0000:03
fc00000000-fdffffffff : 0000:03:00.0
fe00000000-fe0fffffff : 0000:03:00.0
fe20000000-fe301fffff : PCI Bus 0000:08
fe20000000-fe2fffffff : 0000:08:00.0
fe30000000-fe301fffff : 0000:08:00.0
fe30300000-fe304fffff : PCI Bus 0000:04
fe30300000-fe303fffff : 0000:04:00.0
fe30300000-fe303fffff : 0000:04:00.0
fe30400000-fe30403fff : 0000:04:00.0
fe30404000-fe30404fff : 0000:04:00.0
Cpu (0 of 16):
processor : 0
vendor_id : AuthenticAMD
cpu family : 25
model : 80
model name : AMD Ryzen 7 5800H with Radeon Graphics
stepping : 0
microcode : 0xa50000c
cpu MHz : 4187.420
cache size : 512 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe
popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core
perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx
smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat
npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes
vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso ibpb_no_ret spectre_v2_user tsa vmscape
bogomips : 6388.20
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
Pci devices:
# lspci -tvnn
-[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex [1022:1630]
+-00.2 Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU [1022:1631]
+-01.0 Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
+-01.1-[01-03]----00.0-[02-03]----00.0-[03]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff]
| \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
+-02.0 Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
+-02.1-[04]----00.0 MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz [14c3:0608]
+-02.2-[05]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168]
+-02.3-[06]----00.0 Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] [2646:5013]
+-02.4-[07]----00.0 Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] [c0a9:2263]
+-08.0 Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
+-08.1-[08]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1638]
| +-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller [1002:1637]
| +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
| +-00.3 Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 [1022:1639]
| +-00.4 Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 [1022:1639]
| +-00.5 Advanced Micro Devices, Inc. [AMD] Audio Coprocessor [1022:15e2]
| +-00.6 Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller [1022:15e3]
| \-00.7 Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub [1022:15e4]
+-14.0 Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b]
+-14.3 Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e]
+-18.0 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 [1022:166a]
+-18.1 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 [1022:166b]
+-18.2 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 [1022:166c]
+-18.3 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 [1022:166d]
+-18.4 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 [1022:166e]
+-18.5 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 [1022:166f]
+-18.6 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 [1022:1670]
\-18.7 Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 [1022:1671]
Bert Karwatzki