On Wed, Mar 05, 2003 at 01:52:16PM -0800, Jonathan Lundell wrote:
> We've been seeing a curious phenomenon on some PIII/ServerWorks
> CNB30-LE systems.
>
> The systems fail at relatively low temperatures. While the failures
> are not specifically memory related (ECC errors are never a factor),
> we have a memory test that's pretty good at triggering them. Data is
> apparently getting corrupted on the front-side bus.
>
> Here's the curious thing: when we run the same memory test on a
> Windows 2000 system (same hardware; we just swap the disk), we can
> run the ambient temperature up to 60C with no problem at all; the
> test will run for days. (It occurred to us to try Win2K because the
> hardware vendor was using it to test systems at temperature without
> seeing problems.)
>
> Swap in the Linux disk, and at that temperature it'll barely run at
> all. The memory test fails quickly at 40C ambient.
>
> FWIW, CPU cooling is pretty good in this box.
>
> So, the puzzle: what might account for temperature sensitivity, of
> all things, under Linux 2.4.9-31 (RH 7.2), but not Win2K?
Since it doesn't sound like this is a memory error, but a chipset driver
error it could be a Linux driver bug.
You are running a very old kernel, at the least upgrade to the latest
errata (which is currently 2.4.18-26.7. You are running the latest
security updates as well, right?
-Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Fri Mar 07 2003 - 22:00:30 EST