Bill Hawes <whawes@star.net> writes:
> I've found a race condition in mm/filemap.c that could result in a page
> being freed while a task is still on its wait queue, possibly resulting
> in a trashed queue. (Similar to the bug H.J. Lu reported a few days
> ago.) The code is in truncate_inode_pages as follows:
>
> repeat:
> p = &inode->i_pages;
> while ((page = *p) != NULL) {
> unsigned long offset = page->offset;
>
> /* page wholly truncated - free it */
> if (offset >= start) {
> if (PageLocked(page)) {
> wait_on_page(page);
> goto repeat;
> }
>
> Calling wait_on_page without first incrementing page->count violates the
> commented requirements.
It's actually safe. The wait queue for a page is a permanent
structure and survives the freeing of the page from the page cache.
The fact that the page might be removed from the inode list while the
truncate function is stalled does not matter, since if that happens we
go back to the start of the inode list by means of the goto repeat;
jump.
The kernel is full of things like this. clear_inode() is another good
example which goes to great lengths to ensure that the wait list is
preserved even while the inode is being erased. The general rule is
that unless the object is pinned, you are quite entitled to wait on
the object's wait queue, but you can't assume that it's the same
object you started with by the time the wait finishes. The code in
truncate_inode_pages does not assume that the page remains the same
after the wait, so it does satisfy the safety rules.
Cheers,
Stephen.