[PATCH] x86/mce: Dynamically size space for machine check records

From: Naik, Avadhut
Date: Thu Feb 29 2024 - 13:29:17 EST




On 2/29/2024 11:47, Tony Luck wrote:
> On Thu, Feb 29, 2024 at 09:39:51AM +0100, Borislav Petkov wrote:
>> On Thu, Feb 29, 2024 at 12:42:38AM -0600, Naik, Avadhut wrote:
>>> Somewhat confused here. Weren't we also exploring ways to avoid
>>> duplicate records from being added to the genpool? Has something
>>> changed in that regard?
>>
>> You can always send patches proposing how *you* think this duplicate
>> elimination should look like and we can talk. :)
>>
>> I don't think anyone would mind it if done properly but first you'd need
>> a real-life use case. As in, do we log sooo many duplicates such that
>> we'd want to dedup?
>
> There are definitly cases where dedup will not help. If a row fails in a
> DIMM there will be a flood of correctable errors with different addresses
> (depending on number of channels in the interleave schema for a system
> this may be dozens or hundreds of distinct addresses).
>
> Same for other failures in structures like column and rank.
>

Wouldn't having dedup actually increase the time we spend #MC context?
Comparing the new MCE record against each existing record in the
genpool.

AFAIK, MCEs cannot be nested. Correct me if I am wrong here.

In a flood situation, like the one described above, that is exactly
what may happen: An MCE coming in while the dedup mechanism is
underway (in #MC context).

--
Thanks,
Avadhut Naik