Re: [PATCH v3 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

From: Borislav Petkov
Date: Fri Apr 23 2021 - 07:57:36 EST


On Fri, Apr 23, 2021 at 02:18:34AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> I don't know exactly. MCE subsystem seems to have code extracting linear
> address, so I wonder that that could be used as a hint to memory_failure()
> to find the proper virtual address.

See "Table 15-3. Address Mode in IA32_MCi_MISC[8:6]" in the SDM -
apparently it can report all kinds of address types, depending on the hw
incarnation or MCA bank type or whatnot. Tony knows :)

> The situation in question is caused by action required MCE, so
> we know which process we should send SIGBUS to. So if we choose
> to send SIGBUS to all, no innocent bystanders would be affected.
> But when the process have multiple virtual addresses associated
> with the error physical address, the process receives multiple
> SIGBUSs and all but one have wrong value in si_addr in siginfo_t,
> so that's confusing.

Is that scenario real or hypothetical?

Because I'd expect that if we send it a SIGBUS and we poison that page,
then all the VAs mapping it will have to handle the situation that that
page has been poisoned and pulled from under them.

So from a hw perspective, there won't be any more accesses to the faulty
physical page.

In a perfect world, that is...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette