Re: [PATCH v10 1/3] aerdrv: Trace Event for AER

From: Borislav Petkov
Date: Wed Dec 04 2013 - 15:38:49 EST


On Mon, Dec 02, 2013 at 01:05:16PM +0800, rui wang wrote:
> > + TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
> > + __get_str(dev_name),
> > + __entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" :
> > + __entry->severity == HW_EVENT_ERR_FATAL ?
> > + "Fatal" : "Uncorrected",
> > + __entry->severity == HW_EVENT_ERR_CORRECTED ?
> > + __print_flags(__entry->status, "|", aer_correctable_errors) :
> > + __print_flags(__entry->status, "|", aer_uncorrectable_errors))
> > +);
>
> This causes inconsistency between dmesg and the trace event output.
> When dmesg says "severity=Corrected", the trace event says
> "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is
> defined in edac.h:
>
> enum hw_event_mc_err_type {
> HW_EVENT_ERR_CORRECTED,
> HW_EVENT_ERR_UNCORRECTED,
> HW_EVENT_ERR_FATAL,
> HW_EVENT_ERR_INFO,
> };
>
> while aer_print_error() uses aer_error_severity_string[] defined as:
>
> static const char *aer_error_severity_string[] = {
> "Uncorrected (Non-Fatal)",
> "Uncorrected (Fatal)",
> "Corrected"
> };
>
> In this case dmesg is correct because info->severity is assigned in
> aer_isr_one_error() using the definitions in include/linux/ras.h:
> #define AER_NONFATAL 0
> #define AER_FATAL 1
> #define AER_CORRECTABLE 2
>
> So which one is the standard? Is there a plan to unify all these names?

Yes, the AER tracepoint above should use the AER_* defines and not the
HW_EVENT_ERR_* ones which are for memory errors.

Wanna send a fix?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/