Re: 2.4.22-pre lockups (now decoded oops for pre10)

From: Stephan von Krawczynski
Date: Wed Aug 13 2003 - 17:25:27 EST


On Wed, 13 Aug 2003 20:34:52 +0400
Oleg Drokin <green@xxxxxxxxxxx> wrote:

> You seem to be getting corruptions in at least 2 days for now, though.
> And reiserfs seems to trigger the problem even faster (and may be
> even more faster if you enable CONFIG_REISERFS_CHECK).

well, I have an idea how to find out more about these verify problem. Basically
I would try to patch tar to ouput the differing areas to stdout in hexdump
format or the like. Only I need some time to make this work out. I hope to find
some pattern about this corruption.

> > 10 pre's ... That cannot be the way to find out what is going on.
> > On the other hand:
> > - no UP kernel ever crashed. So we can at least talk about an SMP-race.
>
> There is still huge field to look at.
>
> > - 2.4.20 does not crash
> > - 2.4.21 does crash
>
> diff is 20M in size.
>
> > If we can add "ext3 does not crash" to the list, then I really hope we can
> > use some brain and give good selection of patches between 2.4.20 and 2.4.21
> > that may cause the troubles.
>
> There were not much changes in reiserfs. All those patches can easily be
> reverted just for verification purposes. Let me know when you are ready/want
> to test this variant and I will send you a diff.

Hm, my primary belief is that something _around_ reiserfs has changed
semantics.

> > If possible I can then patch out all of them and retry. So there is much
> > less time spent for testing.
> > I mean, have you looked at the length of this thread already?
>
> Yes, I did.
> Now if only we can get someone to reproduce your problems...

Hm, I believe nobody in fact tried a setup like mine. As I have clear
indication that I can trigger it simply by using an SMP box, installing SuSE
8.2, compiling stock 2.4.22-rc2 kernel exporting some reiserfs to a nfs-client
of your choice and starting copying data with sizes around 100GB back and
forth.

> > > > I can add another week if you want me to, just tell me. The only thing
> > > > I don't want is that any doubts are left after testing ...
> > > It would be interesting to look at fsck results on the fs after some time
> > > of testing.
> > You mean I should do an fsck on sunday?
>
> Yes, whenever you decide you have waited long enough (provided that it won't
> crash) and decide to stop testing, please run fsck on that testing fs.

Ok, will do that.

>
> > > Probably it would be easier for you to make it crash (if there are crash
> > > possibility at all) if you enable JBD debugging.
> > I have never seen this in real life. Is it possible to turn this on when
> > handling >100 GB of data or will some debug output flood the box?
>
> It only enables some more checks, not debug output.

Does this work for ext3, reiserfs or both?

Regards,
Stephan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/