Re: + edac-new-opteron-athlon64-memory-controller-driver.patchadded to -mm tree
From: Alan Cox
Date: Tue Jul 04 2006 - 05:51:08 EST
Ar Maw, 2006-07-04 am 11:23 +0200, ysgrifennodd Andi Kleen:
> Regarding your buzzwords: I don't think mcelog is in any way
> less "manageable" or "consistent" than EDAC.
Its chip specific rather than generalised so you need awareness of it.
> > > Hmm, i haven't checked, but my understanding was that the newer
> > > Intel chipsets all forwarded the memory errors as machine
> > > check anyways.
> >
> > Quite a few still in use do not. We also have no idea where the future
>
> New ones? Would surprise me.
All the world is not x86.
> Yes the machine check architecture doesn't try to handle all old systems,
> but then in practice error reporting on old x86 systems doesn't tend
> to work particularly well either.
Its pretty solid on the AMD 32bit chipsets and some of the older Intel
ones.
> mce code also uses a consistent interface - it's even the same
> code in kernel space for all systems.
For the subset of cases it supports.
> We don't have a generic interface for logging some of the other errors
> (like PCI-E errors), but I don't see EDAC solving that. In some ways
> it's understandable because there is no generic PCI-E error handling
> code at all yet.
EDAC solves that for the PCI bus side. It's only solving the logging
side not the "ok it exploded, now what" question - although there are
some unrelated IBM patches in that area.
> > The ecc code predates the MCE bits by years. The re-doing occurred
> > rather earlier. Rather more useful would be to get the common interface
>
> Earlier than the x86-64 machine check code?
Linux 1.2 I believe, certainly by 2.0
> Giving a consistent sysfs interface is a bit harder, but I suppose one
> could change the code to provide pseudo banks for enable/disable too.
> However that would be system specific again, so a default "all on/all off"
> policy might be quite ok.
I think we need the basic consistent sysfs case. Whether that is
provided by the mcelog code in the AMD64 case, or by an exported hook
from the MCE interfaces for AMD64 or duplicating the code in EDAC isn't
so important (avoiding duplication aside of course).
Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/