Re: 1.3.95 is not stable

Linus Torvalds (torvalds@cs.helsinki.fi)
Sat, 27 Apr 1996 12:45:02 +0300 (EET DST)


On 26 Apr 1996, Steven L Baur wrote:
>
> I was able to duplicate this crash (with the same 1 bit error in the
> same place) in a variety of motherboard settings with the common
> element being enabled external caching. I'm now running (and have
> about 4 hours of stable uptime) with external caching disabled.
>
> Can these crashes be explained by bad external cache?

_Anything_ can be explained by bad external caches ;-)

But yes, a bad external cache would give exactly the symptoms you see.
Not that it's the _only_ thing that would give those symptoms (a wild
kernel pointer still isn't ruled out, for example), but a one-bit
corruption is one of the more likely things a bad cache will result in.

For example, if you have _one_ bad bit in a direct-mapped external cache
(the bit refuses to bocome a "one" for example), then you could very well
see the behaviour you have seen. With luck, the kernel code will have
zero's in the one or two places that can be mapped by that particular bad
bit (with a 256kB direct-mapped cache there is likely to be only two
kernel code segment cache lines that would be affected), and depending on
exactly how the kernel happened to be compiled you might never see the
error show up that way.

The one-bit error will show up in user code, obviously, but again it
might not be 100% deterministic (the cache might be ok most of the time
and lose the bit very occasionally). You should have seen occasional
segmentation faults or similar if it's a external cache problem (but
again, it might depend on how the pages get mapped etc)

I'd obviously love for this problem too to be explained by hardware, but
please make as sure of it as you humanly can first. The fact that the
kernel stays up for you with the external cache disabled might be due to
a timing thing too - maybe a network code bug just happens to trigger
under certain circumstances, and disabling the cache changes the timings
enough that the circumstances won't happen.

Linus