Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

From: Naik, Avadhut
Date: Tue Jul 09 2024 - 02:27:41 EST




On 6/26/2024 13:18, Borislav Petkov wrote:
> On Wed, Jun 26, 2024 at 12:24:20PM -0500, Naik, Avadhut wrote:
>>
>>
>> On 6/26/2024 06:10, Borislav Petkov wrote:
>>> On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
>>>> AMD's Scalable MCA systems viz. Genoa will include two new registers:
>>>
>>> "viz."?
>>>
>> Right. Will mention Zen4 instead of Genoa.
>
> I still don't know what "viz." means...
>
IIUC, its an abbreviation of a Latin word and is used as a synonym for "namely"
or "that is to say".
Might not be the best choice in this case. Will change it.

>> Yes, I catch your drift. Will reword the commit message to explain that the
>> new syndrome registers are going to be exported through the tracepoint
>> in a dynamic array, as they are vendor-specific, so that usersapce error
>> decoding tools can retrieve the supplemental error information within them.
>
> Again, why?
>
> Why is it important to have them in the tracepoint?
>
Userspace error decoding tools like the rasdaemon gather related hardware error
information through the tracepoints. As such, its important to have these two
registers in the tracepoint so that the tools like rasdaemon can parse them
and output the supplemental error information like FRU Text contained in them.
Yes, the registers are also being outputted thorough the dmesg but printk messages
are not an ABI.
The proper way to export these registers is through the tracepoint.

>>>> Note: Checkpatch warnings/errors are ignored to maintain coding style.
>>>
>>> This goes...
>>>
>>>>
>>>> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
>>>
>>> Yes, you did but now your SOB chain is wrong:
>>>
>>>> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
>>>> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>>>
>>> This tells me Avadhut is the author, Yazen handled it and he's sending it to
>>> me. But nope, he isn't. So it needs another Avadhut SOB underneath.
>>>
>>> Audit all patches pls.
>>>
>> Wasn't aware of this chronology. Thanks for this information!
>
> Well, there's documentation for that which you should've read already, before
> sending patches:
>
> https://kernel.org/doc/html/latest/process/development-process.html
>
> and
>
> https://kernel.org/doc/html/latest/process/submitting-patches.html
>
> especially.
>
>> So, IIUC, the sequence for this patch should be as follows?
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
>
> Yes, now I leave it to you to explain why. Hint: it is in those docs above.
>
Got it. The first SoB entry is of the primary author. The successive SoB's are
from the people handling and transporting the patch.
IOW, the route taken by a patch, as its propagated, to maintainers and eventually
to Linus, should be evident from the SoB chain.

>>
>>>> ---
>>>
>>> ... right under those three "---" as such notes do not belong in the commit
>>> message. Remember that for the future.
>>>
>> Okay. Will move the note here.
>
> Or remove it completely. checkpatch is crap - I know. No need to have it in
> every patch.
>
Okay. Will remove the note altogether.
It's also present in the commit descriptions of other patches in this set.
Will remove from there as well.

>> Had considered this. But struct mce_hw_err *err wouldn't really be used in
>> mce_read_aux() in patch 1. Only struct mce m, which is already available, will
>> be used.
>
> So?
>
>> Hence, deferred the change to this patch where usage of struct mce_hw_err *err
>> is actually introduced in mce_read_aux().
>>
>> Do you prefer having this change in patch 1 instead?
>
> I prefer a patch to contain one logical and complete change only. Because this
> makes review easier. You should try reviewing patches sometimes too and you'll
> know.
>
Understood. Will move this to patch1.

>>> So that vendor data layout - is that ABI too? Or are we free to shuffle the
>>> fields around in the future or even remove some?
>>>
>>> This all needs to be specified somewhere explicitly so that nothing relies on
>>> that layout.
>>>
>>> And I'm not sure that that's enough because when userspace tools start using
>>> them, then they're practically an ABI so you can't change them even if you
>>> wanted to.
>>>
>>> So is libtraceevent or all the other libraries going to parse this as a blob
>>> and it is always going to remain such?
>>>
>>> But then the tools which interpret it need to know its layout and if it
>>> changes, perhaps check kernel version which then becomes RealUgly(tm).
>>>
>>> So you might just as well dump the separate fields one by one, without
>>> a dynamic array.
>>>
>>> Or do a dynamic array but specify that their layout in struct
>>> mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
>>> where.
>>>
>>> Questions over questions...
>>>
>> Should we document this where struct mce_hw_err is defined, in
>> arch/x86/include/asm/mce.h? Or do you have any other recommendations?
>
> I don't know. If I knew I wouldn't have questions which you can read again and
> try to answer.
>
IIUC, at least for now, the libtraceevent library parses the entire vendor data
array as a blob. Rather, a pointer to the array in the raw tracepoint record along
with its length is returned by the library's tep_get_field_raw() API.

This very API has been used for implementing support for these registers and FRU
Text in the rasdaemon.

https://github.com/mchehab/rasdaemon/pull/122

Thus, the position of the array within the tracepoint and its length can be changed
in the future.

Its layout however, is a completely different matter. At least for AMD, it shouldn't
be changed. New fields, if any, should be added at the end.

The underlying reason for this is the FRU text feature.

With this set, the first two elements of the vendor data dynamic array are SYND 1/2
registers while the third element is MCA_CONFIG (added through patch 4 of the set).
Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are interpreted
as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.

Thus, we depend on array's layout for accurate FRU Text decoding in the rasdaemon.

Hope this answers some of your questions!

--
Thanks,
Avadhut Naik