Re: 2.2.9 hangs in truncate_inode_pages

Linus Torvalds (torvalds@transmeta.com)
Sun, 30 May 1999 10:05:09 -0700 (PDT)


On Sun, 30 May 1999, Alan Cox wrote:
>
> I've been chasing a hang in 2.2.9* and it comes down to truncate_inode_pages.

Hmm.. That hasn't really changed as far as I can tell, so this must have
been a long-time thing that just was exposed by something else. How sure
are you that it is truncate_inode_pages()? (Or should I ask "what made you
find it"?)

> Now there are two things that have me wondering here and one cleanup item
>
> 1. wait_on_page uses schedule() and not the SCHED_YIELD stuff so
> there is no guarantee another person locking the page gets run
> if their lock isnt going to be cleared by tq_disk

Even without the SCHED_YIELD, any schedule() call should _eventually_ make
progress. Unless you have real-time processes, and I suspect you don't.

And in this case, we actually _do_ have a wait-queue we're waiting on, so
this isn't even one of those "wait a while" cases - we're actually really
going to sleep. So this one is a red herring.

> 2. __wait_on_page says you are supposed to own the page before waiting
> on it (ie up page->count). I don't see where we do that. We even
> go back to the start to allow for the pages on the inode changing
> if we sleep but we don't up the count ourselves.

This may be the real offender - I think you found a real bug, and it looks
extremely hard to trigger, but it looks like it has been there from the
very first version.

In normal cases, __wait_on_page() is called for a page that we just looked
up with one of the page hash functions. As a result, we normally _do_ have
the page count updated, and everything is fine - and having wait_on_page
increment the counts on its own would be just a waste of time.

truncate_inode_pages() is rather special. It doesn't look pages up, it
only walks through a list.

The "goto repeat" is actually _trying_ to fix exactly the same bug:
because we sleep, we can't rely on the list being the same any more, so we
have to start anew from the beginning. But it misses the fact that you
can't even rely on the _one_ page being there unless you increment the
page count for it.

Doing a

atomic_inc(&page->count);
..
atomic_dec(&page->count);

around that part should do the trick. What is you test load?

> 3. (minor) in truncate_inode_pages we do if(PageLocked(..))
> wait_on_page. If we know the page is locked shouldn't we do
> __wait_on_page()

Yes. Minor optimization.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/