Re: VM in 2.4.7-pre hurts...

From: Linus Torvalds (torvalds@transmeta.com)
Date: Sun Jul 08 2001 - 12:58:20 EST


On Sun, 8 Jul 2001, Rik van Riel wrote:
>
> ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page
> bug. I've seen this for quite a while now on our quad xeon test
> machine, with some kernel versions it can be reproduced in minutes,
> with others it won't trigger at all.

Hmm.. That would explain why the "tar" gets stuck, but why does the whole
machine grind to a halt with all other processes being marked runnable?

> I hope there is somebody out there who can RELIABLY trigger
> this bug, so we have a chance of tracking it down.
>
> > tar
> > Trace; c012f2da <__wait_on_buffer+6a/8c>
> > Trace; c01303c9 <bread+45/64>

I wonder if "getblk()" returned a locked not-up-to-date buffer.. That
would explain how the buffer stays locked forever - the "ll_rw_block()"
will not actually submit any IO on a locked buffer, so there won't be any
IO to release it.

And it's interesting to see that this happens for a _inode_ block, not a
data block - which could easily have been dirty and scheduled for a
write-out. So I wonder if there is some race between "write buffer and try
to free it" and "getblk()".

The locking in "try_to_free_buffers()" is rather anal, so I don't see how
this could happen, but..

That still doesn't explain why everybody is busy running. I'd have
expected all the processes to end up waiting for the page or buffer, not
stuck in a live-lock.

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 15 2001 - 21:00:08 EST