Re: Hardware Error Kernel Mini-Summit

From: Andi Kleen
Date: Tue May 18 2010 - 14:10:21 EST


On Tue, May 18, 2010 at 01:50:36PM -0300, Mauro Carvalho Chehab wrote:
> Ok. It should be clear that the main target of the mini-summit is to define
> how the several subsystems will integrate into a hardware-abstracted way
> to report errors from kernel. So, we're looking on the next steps to improve
> what we currently have, and avoid to have more than one different subsystem
> trying to get the same info, eventually using the same registers, but providing
> different interfaces to userspace.

Well there are different use cases.

mcelog mainly deals in thresholds (including fancy ones like
per page and per object thresholds) and events and actions to thresholds
(= more events), all your proposals are dealing with objects counts currently.

It does per object counting too, but only incidentially.

I suspect there are use cases for both, although I personally suspect
for most people events, thresholds and their actions are the most useful
thing to handle by default. But one size doesn't fit all.

Anyways it boils down you need different interfaces for different things.

For example there will be always events versus accounting.

You can synthesize accounting from events (that is what mcelog
does today). The other way round does not work so well unfortunately,
or at least would be rather inefficient.

Also large parts of the actions can be only usefully done in user space, so
you need a user space component.

I am somewhat biased of course but I think mcelog is doing a reasonable
good job today at being this user space component. It definitely
has areas that could be improved too, but at lot of the basics
are there and doing ok.

In principle mcelog could feed from another API too, but it would
definitely prefer to not to have to poll it or having to parse
printks.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/