RE: [PATCH v2 0/2] Update mce_record tracepoint
From: Luck, Tony
Date: Fri Jan 26 2024 - 15:49:19 EST
> > Is it so very different to add this to a trace record so that rasdaemon
> > can have feature parity with mcelog(8)?
>
> I knew you were gonna say that. When someone decides that it is
> a splendid idea to add more stuff to struct mce then said someone would
> want it in the tracepoint too.
>
> And then we're back to my original question:
>
> "And where does it end? Stick full dmesg in the tracepoint too?"
>
> Where do you draw the line in the sand and say, no more, especially
> static, fields bloating the trace record should be added and from then
> on, you should go collect the info from that box. Something which you're
> supposed to do anyway.
Every patch that adds new code or data structures adds to the kernel
memory footprint. Each should be considered on its merits. The basic
question being:
"Is the new functionality worth the cost?"
Where does it end? It would end if Linus declared:
"Linux is now complete. Stop sending patches".
I.e. it is never going to end.
If somebody posts a patch asking to add the full dmesg to a
tracepoint, I'll stand with you to say: "Not only no, but hell no".
So for Naik's two patches we have:
1) PPIN
Cost = 8 bytes.
Benefit: Emdeds a system identifier into the trace record so there
can be no ambiguity about which machine generated this error.
Also definitively indicates which socket on a multi-socket system.
2) MICROCODE
Cost = 4 bytes
Benefit: Certainty about the microcode version active on the core
at the time the error was detected.
RAS = Reliability, Availability, Serviceability
These changes fall into the serviceability bucket. They make it
easier to diagnose what went wrong.
-Tony