Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
From: David Hildenbrand (Red Hat)
Date: Tue Nov 04 2025 - 04:34:13 EST
On 04.11.25 08:23, Xie Yuanbin wrote:
Memory bit flips are among the most common hardware errors in the server
and embedded fields, many hardware components have memory verification
mechanisms, for example ECC. When an error is detected, some hardware or
architectures report the information to software (OS/BIOS), for example,
the MCE (Machine Check Exception) on x86.
Common errors include CE (Correctable Errors) and UE (Uncorrectable
Errors). When the kernel receives memory error information, if it has the
memory-failure feature, it can better handle memory errors without reboot.
For example, kernel can attempt to offline the affected memory by
migrating it or killing the process. Therefore, this feature is widely
used in servers and embedded fields.
This is a pretty generic description of MCEs.
I think what we are missing is: who runs 32bit OSes on MCE-capable
hardware (or VMs?) and needs this to work.
What's the use case?
--
Cheers
David