bug leading to stuck pages

Bill Hawes (whawes@star.net)
Wed, 04 Jun 1997 14:41:23 -0400


I've found a bug that looks like it could lead to a cascading loss of
available memory under already-low low-memory situations.

In generic_read() before starting i/o the page's page->count is
incremented, and both PG_locked and PG_free_after are set:

atomic_inc(&page->count);
set_bit(PG_locked, &page->flags);
set_bit(PG_free_after, &page->flags);

Then a call is made to brw_page, ignoring the return:

/* IO start */
brw_page(READ, page, inode->i_dev, nr, inode->i_sb->s_blocksize, 1);
return 0;

But in brw_page, if it can't allocate a buffer header, the error exit
clears PG_locked, but doesn't decrement page->count:

bh = create_buffers(page_address(page), size);
if (!bh) {
clear_bit(PG_locked, &page->flags);
wake_up(&page->wait);
return -ENOMEM;
}

This leaves the page "stuck" so that it can't be reclaimed later. Under
low-memory conditions a series of calls to generic_read could rapidly
lead
to many pages becoming stuck, leaving the system unable to restore
itself
to proper operation.

I'm not sure what the best way to fix this it would be ... maybe have
brw_page do nothing except return the error, and have all callers check
the return codes?

There are other problems as well -- if it's possible for two tasks to
post async reads to the same page, then it would be very tricky to
restore the PG_locked flag. Also, using PG_free_after would be a
problem, since only one decrement of the page->count would happen.

-Bill