Re: memory corruption under heavy load?

Colten Edwards (edwards@panasync.canuck.ca)
Mon, 1 Apr 1996 13:24:47 -0600 (CST)


On Mon, 1 Apr 1996, Marek Michalkiewicz wrote:

> Trying to stress-test some new hardware, I did "make -j" on the kernel
> sources. It runs for a while, starts swapping a lot, load average goes
> up to 30 or so, then cc1 gets fatal signal 11. There is still plenty
> of free swap space at this point (20MB total, about 10MB free).
been there.. done that

>
> Before everyone will tell me: "it's a hardware problem, read the signal
> 11 FAQ, Linux has no bugs", read on...
>
> This never happens on the same machine under 1.2.13, 1.3.45 and 1.3.58
> (it runs happily until it runs out of swap space). I can reproduce it
> every time I try "make -j" on any large source package, under 1.3.80
> and 1.3.74. Normal compiles work fine under 1.3.80, just not "make -j"
> so I don't think it's a hardware problem.
>
> The symptom is usually signal 11, but sometimes also syntax errors in
> perfectly good include files. It looks like memory corruption caused
> by some changes between 1.3.58 and 1.3.74. I suspect the new swapping
> or page cache code. I can do a binary search, to determine exactly
> which patchlevel broke things - just tell me if this is necessary.

been here.. had that.

>
> Hardware: GigaByte GA-5486AL PCI motherboard with ALI1489 chipset (BTW,
> it seems to work fine with normal IDE driver, specifying ide0=ali14xx
> slows it down 3MB/s -> 1MB/s!), AMD 5x86-160 CPU, WD AC2850 hard drive
> (814MB EIDE), 8MB RAM (two 70ns 4MB PS/2 SIMMs).

Gigabyte GA 586 PCI with triton chipset. 256k pipeline burst. Intel 586
100 WD 1gig and a Quantum 1.2g 16meg of 60ns ram.

On my first board I could get rid of the problem by disabling the
external cache. I had the board replaced and the problem has gone away.
So yes it can be a hardware problem. BTW that board probably would work
fine with dos or windows. Just not under linux. So try disabling the
external cache and slow down your memory access because of the slow
SIMM's and see what happens.... If it cures the problem then it's
probably not a linux problem but a hardware problem.

Colten Edwards