Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependenton PCI

From: Andi Kleen
Date: Fri Feb 19 2010 - 10:38:01 EST




EDAC is generic enough to work with different type of memory and memory
controllers, and to provide a consistent interface to describe it on a way
that userspace doesn't need to know what are the error registers at
the hardware, nor how to decode a "magic" error number into something
that has a meaning.

Well the main problem I have with EDAC is that it has far too much information
(e.g. down to ranks/banks and also too much information on
the internal topology of the memory controller, and it can't even
express some current designs).

For me it looks like it was designed by someone starring at a motherboard/DIMM
semantics plan, and I don't think that's the right level to think about
these things.

Going that deep typically
requires very hardware specific information and in some cases
it's not even possible. I also don't think it's useful information
to present (and it's really the opposite of "abstraction")

I also have yet to see a useful use case where you need to look "inside" a DIMM
on the reporting level. The useful level is typically the "FRU" (something
you can replace), with only some very specific extensions for special
use cases.

There's also no generic way to do the necessary enumeration down to
the level EDAC needs. For some cases hardware specific drivers can be written,
but it's always better if the generic case works in a architectural way.

Then it does all the enumeration on the kernel, but there
are no useful facilities to sync that with a user level representation.
And most of the useful advannced & interesting RAS features I'm interested in
need user level support.

I prefer at least for MCE to stay on the architectural level
with only minor extensions for specific use cases.

Now to address these problems you could throw large parts of EDAC
out (do you mean that with 'flexible enough'?) and then add a actual
event interface (working on the later is my plan)

As Boris properly pointed, EDAC has space for improvements, and part of
the perf logic can be used as a start point to give some flash new ideas.

See my analysis several mails up. Which parts of perf do you want
to actually use? I don't see any that's actually directly usable
without major changes.

The main issue I see with MCE is at the interface level. I think if we
all cope together, we can converge into a proper interface that will
be accepted upstream.

Just that we're on the same level, could you spell out in detail
what problems you're seeing with it?

[I'm not claiming there are none, I'm just curious what you think they are]

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/