Re: Massive e2fs corruption with 2.2.9/10?

Patrick Mau (patrick@oscar.prima.de)
Sun, 20 Jun 1999 21:43:54 +0200


On Sun, Jun 20, 1999 at 11:36:17AM -0700, Linus Torvalds wrote:
>
>
> On Sun, 20 Jun 1999, Harald Koenig wrote:
> >
> > update: I'm no longer sure that it's a kernel problem in my case.
> > after reducing the CPU clock from 450 to 400 MHz I've now compiled
> > >2*100 kernels using 2.2.10 with no single problem at all.
>
> Hmm.. Interesting.
>
> > so maybe it was just a hardware problem for me -- or it's some weird race
> > which doesn't show up for 400 but only for 450 MHz ?! I can't believe that yet...
>
> Could be. It sounds fairly unlikely, but timing-related bugs are always
> the worst to find. I wonder what it was that pushed the system over the
> edge in the later 2.2.x (we had a _lot_ of this in the 2.0.x timeframe:
> people who had run 1.2.x with no problems and suddenly got problems due to
> 2.0 being better at utilizing the hardware).
>
> But yes, for the time being I will just assume it is hardware-related, and
> just wait for the reports to continue.

Hello all, hello Linus,

I watched that thread with great interrest and now I have to add something
to it. I have two systems: One is an AMD K6-2 350 MHz the other is a new
dual Pentium III 450 Mhz. By the time I installed the dual cpu system I
upgraded my kernel to 2.2.9 and saw the same file corruption.

Then I changed my RAM timings to lower values and everything worked. I
thought it was a RAM issue, but today I took the time and installed my new
RAM's from the dual system into the AMD system and they are working without
problems at high settings.

File corruption occurs when I untar two linux-tarballs and do a diff between
the two. I can see random bit flipping and and excahnged lines of source.

Theese corruptions also happen when I use an UP kernel on the dual cpu machine
and do _not_ happen on the AMD system. I disagree that it's hardware-related,
because both systems are running at 100 Mhz bus speed and were both under
heavy load during these tests.

I think it's a hard-to-trigger timing issue in the kernel, because using 2.2.6
on the dual machine at high ram timings do not show corruptions.

cheers,
Patrick

PS: Sorry this mail sounds confused, but I'm willing to do any tests you want
on both machines.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/