RE: [PATCH] x86/mce: Add Skylake quirk for patrol scrub reported errors

From: Luck, Tony
Date: Tue Jun 16 2020 - 18:33:17 EST


> Two things: can that error type be detected when #MC gets raised, i.e., in
> do_machine_check() as part of scanning all banks?

If the BIOS option is left in the default setting, uncorrectable errors found
by the patrol scrubber are reported with a machine check. Those MSCOD
and MCACOD signatures are the same ... but that's not important because
MCi_STATUS.UC==1. So Linux doesn't need to jump through hoops to
"upgrade" the severity.

> If so, then the adjusting needs to happen inside mce_log().
So no, this adjust only needs to happen when polling the banks from
CMCI or periodic timer.

> Also, that assignment to the function pointer doesn't make much sense to
> me and I think you should do the vendor/family/model checking straight
> in a function adjust_mce_log() which gets called by whoever...

The point was to avoid the runtime test for CPU model on every error. But
this isn't a performance critical path, so we can refactor if you think that
looks cleaner.

There is some new set of validation tests running now to check the effectiveness
of this BIOS + OS change. So it may be a while before updated version is
posted.

Thanks

-Tony