Race while waiting on page

Bill Hawes (whawes@star.net)
Wed, 04 Jun 1997 12:42:22 -0400


I've found a race condition in mm/filemap.c that could result in a page
being freed while a task is still on its wait queue, possibly resulting
in a trashed queue. (Similar to the bug H.J. Lu reported a few days
ago.) The code is in truncate_inode_pages as follows:

repeat:
p = &inode->i_pages;
while ((page = *p) != NULL) {
unsigned long offset = page->offset;

/* page wholly truncated - free it */
if (offset >= start) {
if (PageLocked(page)) {
wait_on_page(page);
goto repeat;
}

Calling wait_on_page without first incrementing page->count violates the
commented requirements. Here's a situation that could lead to trouble:

Task 1 needs to perform i/o to a page in an inode cache, so it
increments page->count, sets PG_locked, starts the i/o, and sleeps.
Let's say page->count is now 2.

Task 2 calls truncate_inode_pages, finds the page locked, and calls
wait_on_page.

The i/o completes, the page is unlocked, and both tasks are
rescheduled. Due to vagaries in scheduling, task 1 runs first, finishes
what it's doing and decrements page->count. The system is low on
memory, so any of various mechanisms are called to free up a page. Since
page->count is now 1, the page (with task 2 still waiting on it) is
removed from the inode list and freed.

The following simple patch would avoid an Oops in the above situation,
but would result in the page being lost if it's already been kicked out
of the
inode list.

--- mm/filemap.c.old Tue May 20 13:16:23 1997
+++ mm/filemap.c Wed Jun 4 10:28:52 1997
@@ -102,7 +102,10 @@
/* page wholly truncated - free it */
if (offset >= start) {
if (PageLocked(page)) {
+ /* increment count while we're waiting */
+ atomic_inc(&page->count);
wait_on_page(page);
+ atomic_dec(&page->count);
goto repeat;
}
inode->i_nrpages--;

A better solution would be to restructure truncate_inode_pages to first
get rid of the unlocked pages and then handle the locked cases.

-Bill