Re: [PATCH 07/14] mce3: pass mce info to EDAC for decoding

From: Borislav Petkov
Date: Tue Jul 21 2009 - 06:49:45 EST


On Tue, Jul 21, 2009 at 08:51:28AM +0200, Andi Kleen wrote:
> On Tue, Jul 21, 2009 at 12:41:34PM +0900, Hidetoshi Seto wrote:
> > H. Peter Anvin wrote:
> > > If you want modules to change the behavior, you're talking about a
> > > *dynamic* change -- the call will point to different things at different
> > > points in time -- so you need another mechanism, i.e. function pointers.
> >
> > Just FYI, machine check handler on ia64 has such function pointer.
> >
> > [arch/ia64/kernel/mca.c]
> > 826 /* Function pointer for extra MCA recovery */
> > 827 int (*ia64_mca_ucmc_extension)
> > 828 (void*,struct ia64_sal_os_state*)
> > 829 = NULL;
>
> A notifier would be a much more flexible solution. Function pointers
> don't really work well with multiple users, which might well happen
> here.
>
> However on the other hand I have some doubts it's really a good
> idea to expose fatal MCEs to modules. MCE is a rather critical
> code path (a bit similar to an oops), with the machine
> already somewhat instable in many cases and if you allow
> arbitary modules to hook into that you risk long term
> instability.
>
> So if a notifier is done I would recommend to only limit
> it to corrected MCEs (machine_check_poll), not fatal ones.

However, the idea is to decode _all_ MCEs so we could look into moving
the decoding bits into the EDAC core or some other more appropriate
place. Ingo?

We could then reroute the non fatals to EDAC for further decoding.

--
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/