RE: [RFD PATCH] x86/mce: Make sure to send SIGBUS even after losing the race to poison a page

From: Luck, Tony
Date: Thu Sep 03 2020 - 13:09:57 EST


> Let's see if that logic makes sense: if #MC offlines the page and sends
> SIGBUS but CMCI only offlines the page, isn't it only logical for the
> CMCI to *also* send the SIGBUS too, after having offlined the page?
>
> I.e., both should do the proper and full recovery action. Just sayin...

It made sense, and seemed to explain an issue I was seeing, when I wrote it.
But some stress testing of that patch showed that it introduces some problems
and instability.

Without the patch I can inject 10,000 errors and have every one of them complete
correctly (process gets a SIGBUS with the address of the error). With my patch
around 0.4% of injections fail to provide the address to the SIGBUS handler, worse
the test gets a fatal error every 600-700 injections.

So, I'm abandoning that patch.

-Tony