Re: [PATCH v24b] RAS: Add a tracepoint for reporting memorycontroller events
From: Borislav Petkov
Date: Sat May 19 2012 - 05:27:17 EST
On Fri, May 18, 2012 at 11:12:11PM +0200, Borislav Petkov wrote:
> On Fri, May 18, 2012 at 07:10:42PM +0000, Luck, Tony wrote:
> > > That's why _each_ _driver_ will have its format and the userspace tools
> > > parsing it will know about it!
> >
> > Sounds like a full employment program for parser writers.
> >
> > There are some interesting fields that should be common to all
> > drivers ... so have twenty parsers that can handle:
> >
> > paddr: 0x1234
> > PADDR: 0x1234
> > Paddr = 0x1234
> > Phys = 1234
> > addr: 0x1234
> > Address: 0x000000001234
> >
> > looks like a lot of make-work ... when the OS can standardize in the ABI
> > that there is a 64-bit binary value that is the physical address of the
> > error (and another 64-bit mask saying which, if any, bits are valid).
>
> Makes sense, I gotta say :)
>
> > So we should be looking for the set of always relevant values that
> > can be kept explicitly separate in fields in TP_PROTO, and per-driver
> > specific stuff (grain/syndrome??) bits that will have to go into the
> > "details" string and require some driver specific user-mode parsing to
> > use.
>
> How about we put all the values which are globally valid for all drivers
> in separate fields and leave the "(...)" brackets for details where each
> driver can put its own relevant stuff?
>
> For the record, I very much like this reasoning :).
On a second thought, filtering out the globally valid fields for all
drivers might require a lot of careful driver auditing.
What would be better and easier is to add those single fields to the
tracepoint which are relevant to the user (and which are more or less
globally valid for all drivers by inferrence) and leave the rest lumped
together in a single char *.
Which is basically what I'm suggesting for a couple of days now :-)
TP_PROTO(const unsigned int err_type,
const unsigned int mc_index,
const char *error_msg,
const char *label,
const char *location,
const char *detail)
and I'm really not sure about err_type - this is an edac-specific define
and it means nothing outside the kernel so its string representation
could very well could be merged with error_msg and we can drop the ( ? :
) ugliness in the tracepoint definition itself.
IOW:
TP_PROTO(const unsigned int mc_index,
const char *error_msg,
const char *label,
const char *location,
const char *detail)
Now I get all warm and cosy simply I'm staring at this :-).
Hmmm.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/