Re: [RFC/PATCH, 1/4] readX_check() performance evaluation

From: Andi Kleen
Date: Wed Jan 28 2004 - 14:43:43 EST


On Wed, 28 Jan 2004 11:24:05 -0800
David Mosberger <davidm@xxxxxxxxxxxxxxxxx> wrote:

> >>>>> On Wed, 28 Jan 2004 19:52:46 +0100, Andi Kleen <ak@xxxxxxx> said:
>
> >> I find this comment interesting. Can you elaborate what you mean by
> >> "slightly buggy systems"?
>
> Andi> e.g. one bit ECC errors in memory are quite common. And with
> Andi> ECC memory they are not really fatal.
>
> Yet they are a good indicator that something is wrong (not performing
> properly) or may be failing soon. I don't think putting on blinders
> for such problems is a good idea. Though I agree that the question of

Most server class hardware should log it somewhere and allow
to read the event log in the firmware. This even works for unhandleable
errors unlike what the OS could do.

But when printed in Linux they will report it to the linux maintainer or their
distribution vendor. "My Linux is buggy and giving these weird messages" And they
are both in no position at all to do something about it.

I toyed with the idea of printking a disclaimer of
"This is likely not a software bug. Report it to your hardware vendor."
But I doubt this would help much. Even when you say clearly in the message
that the hardware failed the user sees a weird message and thinks
it is Linux's fault.

You could enable it with CONFIG_I_HAVE_A_HARDWARE_SUPPORT_CONTRACT_OR_I_WRITE_DRIVERS
Or just make it a kernel command line option with off by default.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/