Re: 2.2.9 hangs in truncate_inode_pages

Alan Cox (alan@lxorguk.ukuu.org.uk)
Sun, 30 May 1999 19:09:03 +0100 (BST)


> Hmm.. That hasn't really changed as far as I can tell, so this must have
> been a long-time thing that just was exposed by something else. How sure
> are you that it is truncate_inode_pages()? (Or should I ask "what made you
> find it"?)

A couple of server boxes I use for testing kept hanging. So in the end
I put a keyboard/mouse on the hanging boxes and bashed alt-scrolllock and
co a lot once it stopped.

> And in this case, we actually _do_ have a wait-queue we're waiting on, so
> this isn't even one of those "wait a while" cases - we're actually really
> going to sleep. So this one is a red herring.

Ok. Yes because we must either wait or succeed.

> > 2. __wait_on_page says you are supposed to own the page before waiting
> > on it (ie up page->count). I don't see where we do that. We even
> > go back to the start to allow for the pages on the inode changing
> > if we sleep but we don't up the count ourselves.
>
> This may be the real offender - I think you found a real bug, and it looks
> extremely hard to trigger, but it looks like it has been there from the
> very first version.

With 2.2.7 I've never seen it, with 2.2.9 and 2.2.9ac1 I see a hang every 24-36
hours. Both my test boxes hung on the same thing.

> atomic_inc(&page->count);
> ..
> atomic_dec(&page->count);
>
> around that part should do the trick. What is you test load?

On the one box I was running a pile of assorted network and disk hogs
- repeatedly tarring the fs to /dev/null, find / etc

On the other it was basically an idle 6.0 box.

I'll rebuild with this changed and run some more tests

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/