Re: [PATCH 0/2] Generic hardware error reporting support

From: huang ying
Date: Sat Nov 20 2010 - 20:06:42 EST


On Sun, Nov 21, 2010 at 8:50 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Nov 20, 2010 at 4:42 PM, huang ying
> <huang.ying.caritas@xxxxxxxxx> wrote:
>>
>> I don't want to hide the information from the MIS people with the
>> tool. I want to show the information to MIS people in a better way.
>
> You really don't understand, do you?

I mean the tool can cook the raw error information from kernel and
report it in a better way. Yes. You are right that the user space
error daemon is not popular now. But every tool has its beginning,
isn't it? I know it is impossible for this tool becomes popular in
desktop users because hardware error is really rare for them. But it
may become popular for server farm administrators, to them hardware
errors are common and they really care about the RAS.

> People won't even _know_ about your tool. ÂIt's too f*cking
> specialized. They'll have come from other Unixes, they'll have come
> from older Linux versions, they don't know, they don't care.
>
> They _do_ know about system logs.

I have no objection to report hardware errors in system logs too. So
these people can get the information too. I just want to add another
tool oriented interface too. So that some other users (like cluster
administrator) can get their work done better too.

> The most common kind of "system admin" is the random end-user. Now,
> admittedly Intel seems to have its head up its arse on the whole
> "regular people care about ECC and random memory corruption", and it
> may be that consumer chips simply won't support the whole magic error
> handling code, but the point remains: we don't want yet another
> obscure error reporting tool that almost nobody knows about.
> Especially for errors that are so rare that you'll never notice if you
> are missing them.

For desktop users, that is true. But for cluster administrator, the
hardware errors are really common. Some engineer of local search
engine vendor told me that they have broken DIMM everyday.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/