RE: [PATCH v2 0/2] Update mce_record tracepoint

From: Luck, Tony
Date: Fri Jan 26 2024 - 17:03:31 EST


> But no, that's not the right question to ask.
>
> It is rather: which bits of information are very relevant to an error
> record and which are transient enough so that they cannot be gathered
> from a system by other means or only gathered in a difficult way, and
> should be part of that record.
>
> The PPIN is not transient but you have to go map ->extcpu to the PPIN so
> adding it to the tracepoint is purely a convenience thing. More or less.
>
> The microcode revision thing I still don't buy but it is already there
> so whateva...
>
> So we'd need a rule hammered out and put there in a prominent place so
> that it is clear what goes into struct mce and what not.

My personal evaluation of the value of these two additions to the trace record:

PPIN: Nice to have. People that send stuff to me are terrible about providing surrounding
details. The record already includes CPUID(1).EAX ... so I can at least skip the step of
asking them which CPU family/model/stepping they were using). But PPIN
can be recovered (so long as the submitter kept good records about which system
generated the record).

MICROCODE: Must have. Microcode version can be changed at run time. Going
back to the system to check later may not give the correct answer to what was active
at the time of the error. Especially for an error reported while a microcode update is
waling across the CPUs poking the MSR on each in turn.

-Tony