Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependenton PCI

From: Andi Kleen
Date: Fri Feb 19 2010 - 08:21:59 EST


Borislav,

<snip all the perf analysis>

I think you're missing the point - it doesn't have to be
perf. It could just as well be some other tool which _shares_

That was one of my points, but there were others too about
the suitability of the kernel infra structure and the interfaces.

I don't think the perf syscall interface (nor the internal
implement) in its current form is good for errors and also see no clear
path to make it so (without basically adding another different interface)

functionality with perf. See this last mail from Ingo:
http://marc.info/?l=linux-kernel&m=126635420607236 on which you were
also CCed, by the way, where he suggests that we could have a tool
called 'hw' which reuses functionality with perf but concentrates on
error handling and all that RAS functionality in modern CPUs. It should
also have a daemon component etc...

So you would have different interfaces: you don't really
need new syscalls for this (read/write is fine) and perf_counter_open()
in its current form is completely unsuitable for errors
(unless your error reporting registers look like a x86 PMU --
mine doesn't at least) And different userland. And the kernel internal
requirements are very different too (see previous email)

Where is the commonality with perf?

On the tool side:

I'm working on such a tool already for quite some time. It's
called mcelog. Now it uses an older interface today, but at some
point I would expect it to move to other interfaces too
(e.g. next step for that would be APEI errors)

If you only knew mcelog from a few years ago: it's quite
different today than it was and please look again.

That is the end result will be likely called different
(it doesn't make much sense to call something that handles
all kinds of errors "mcelog") and also some stuff needs
to be more generic, but I suspect it'll share quite some
concepts.

If the only problem is the naming we can probably work something
out? In principle it could be called "hw", but the name
seems awfully generic, especially for a daemon. I was more
tending something like "errord" or so.

On the topology: I was not trying to replace existing
topology tools (like lscpu, lspci etc.). I don't see any
major problems (apart from some details that don't
deserve a redesign) with them.


year. You are refusing to work with other people on a well designed

Sorry, but from our last discussion on attempting to work towards such
an infrastructure solution I got the same impression as Thomas and Ingo
that you're simply not willing to work together on getting a real thing
done. That's why I stopped bothering - it simply made no sense to me to
waste time in fruitless discussions.

Well I keep ignoring suggestions to put more stuff into EDAC,
mostly because I think the EDAC design needs to be thrown out
instead of extended. Are you referring to that?

My impression was that you got to the same conclusion (at least
for parts of current EDAC like the events) based on your earlier emails.

The current issue is less in enumeration/topology anyways but more
in event handling I would say. In the end topology/enumeration is
the easier part, and most of it is already working quite well.

I'm trying to do things step by step, including for short term
problems extending current interfaces if possible and then longer
term moving to new better interfaces.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/