Re: File corruption in 2.2.12*

Nicholas R LeRoy (nick.leroy@norland.com)
Fri, 8 Oct 1999 15:32:41 -0500


Hello, all...

On Oct 4, 8:50pm, Andreas Dilger wrote:
> Subject: Re: (Fwd) Re: File corruption in 2.2.12*
> You write:
> > Several people (including Stephen quoted above) have suggested that
> > this looks like a hardware issue. Since I've run memtest86
> > extensively with no errors, I'm inclined to believe that it's *not* a
> > memory problem.
>
> I had a similar problem on my system when I installed my second CPU.
> It worked fine with either CPU installed individually, and I tried
> different memory, etc, but with both CPUs I got "trap 11" or whatever
> errors while compiling. When I disabled the external cache via BIOS
> config, the problem went away. The only bad thing is that I haven't
> been able to find a new cache module for my motherboard..

I just wanted to give you all an update on my system, and thank you
all for your generous help and suggestions.

My original:

> 2.2.12+ikd and 2.2.13pre14+ikd are both exhibitting some *BAD*
> behaviour on my system. I've run memtest86 for umteen passes, etc..
> I'm pretty sure this one is a real bug.
>
> Here's what I know.
>
> # cd /usr/src/linux
> # make bzImage
> ......
> ......
> weird syntax error from the compiler after some time.
> # less (source file that compiler pukes on)
>
> File has strange corrupt characters in it. Usually, exactly 2.
> Hmmm.... HD transfer problem?
> In this case, the file was include/linux/sched.h line 517
>
> # cp include/linux/sched.h /tmp
> # shutdown -r now
>
> After reboot:
> # diff include/linux/sched.h /tmp
> 517c517
> < extern int do_sigaltstack(const stack_t *ss, stack_t *oss, unsigned long
sp);
> ---
> > extern int do_sigaltstack(const stack_v *ss, sDack_t *Mss, unsIgned long
sp);
>
> As you can see, exactly two bytes were corrupt in the bufferred version
> of the file, and thus copied to the file in /tmp, but the original ON DISK
> version of the file is fine. Something rotten in China.
>
> That's all I know for now. I'll be gone for the weekend, so hopefully
> this is all the info you need. Just wanted to report it as soon as possible.

Here's what I've done & learned:

1 - My HD's firmware had known bugs. Seagate emailed me an upgraded
firmware which I installed
2 - My IDE cable was about 19". I replaced it with an 18" cable.

After these two "fixes", I was still having some problems.

3 - Started pulling SIMMs out. Pulled pair 2 (2x16M) out of bank 2.
System oops during boot, several times in a row. Pulled pair 1
(2x32M) out of bank 1, replace with pair 1 in bank 1. Kernel builds
ok 4 times in a row (much better than I could do before). Hmm... Pair
1 seems suspect. Put pair 1 in bank 2 (total back to 96M), system
seems stable. Started building kernels in a while loop for 5 hours
straight. Never a hicup. Best I can guess is that pair 1 just wasn't
seated properly.

The worst thing is that running memtest86 for *hours* revealed nothing.
As several people indicated, that memtest86 passed proves nothing. I
also tried 'checkit', which was equally revealing. That they *both*
passed gave me some misleading confidence in my memory sub-system.

In any case, I'm trying to find out now if somehow all of these will solve
my original X server lockup problems. I was having that problem with a
completely different motherboard / CPU (same RAM, though), so I sorta
doubt that that problem is solved. I can still hope, though. :-)

I'll let you all know for now.

Thanks again for all your help!!

-Nick

-- 
+-------------------------------+--------------------------------------------+
| /`--_   Nicholas R LeRoy      | In a world without fences, Who needs Gates?|
|{     }/ Norland Corporation   |        ---- Experience Linux! ----         |
| \ *  / W6340 Hackbarth Rd     | http://www.linux.org | http://www.ssc.com  |
| |___| Fort Atkinson, WI 53538 +--------------------------------------------+
|      nick.leroy@norland.com   | #include <disclaimer.h>                    |
|http://www3.norland.com/~nleroy| These are my own ideas, not my employer's. |
+----------------------------------------------------------------------------+

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/