Re: [RFC][PATCH 1/3] radix priority search tree - objrmap complexity fix

From: Andrea Arcangeli
Date: Thu Apr 01 2004 - 19:17:41 EST


On Thu, Apr 01, 2004 at 05:15:34PM +0200, Andrea Arcangeli wrote:
> > diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids x-ref/mm/page_io.c x/mm/page_io.c
> > --- x-ref/mm/page_io.c 2004-04-01 17:07:10.231289760 +0200
> > +++ x/mm/page_io.c 2004-04-01 17:07:33.182800600 +0200
> > @@ -139,7 +139,7 @@ struct address_space_operations swap_aop
> >
> > /*
> > * A scruffy utility function to read or write an arbitrary swap page
> > - * and wait on the I/O.
> > + * and wait on the I/O. The caller must have a ref on the page.
> > */
> > int rw_swap_page_sync(int rw, swp_entry_t entry, struct page *page)
> > {
> > @@ -151,8 +151,16 @@ int rw_swap_page_sync(int rw, swp_entry_
> > lock_page(page);
> >
> > BUG_ON(page->mapping);
> > - page->mapping = &swapper_space;
> > - page->index = entry.val;
> > + ret = add_to_page_cache(page, &swapper_space, entry.val, GFP_KERNEL);
> > + if (unlikely(ret)) {
> > + unlock_page(page);
> > + return ret;
> > + }
> > + /*
> > + * get one more reference to make page non-exclusive so
> > + * remove_exclusive_swap_page won't mess with it.
> > + */
> > + page_cache_get(page);
> >
> > if (rw == READ) {
> > ret = swap_readpage(NULL, page);
> > @@ -161,7 +169,13 @@ int rw_swap_page_sync(int rw, swp_entry_
> > ret = swap_writepage(page, &swap_wbc);
> > wait_on_page_writeback(page);
> > }
> > - page->mapping = NULL;
> > +
> > + lock_page(page);
> > + remove_from_page_cache(page);
> > + unlock_page(page);
> > + page_cache_release(page);
> > + page_cache_release(page); /* For add_to_page_cache() */
> > +
> > if (ret == 0 && (!PageUptodate(page) || PageError(page)))
> > ret = -EIO;
> > return ret;
>
> incrementally to the above I applied this hardness checks in the
> anon-vma patch, so we're safe against the problem Andrew outlined
> (somebody attempting to do swapin/swapouts of anonymous pages through
> that interface, something that shouldn't happen since we want only the
> VM to deal with userspace mapped pages).
>
> @@ -149,8 +149,14 @@ int rw_swap_page_sync(int rw, swp_entry_
> };
>
> lock_page(page);
> -
> + /*
> + * This library call can be only used to do I/O
> + * on _private_ pages just allocated with alloc_pages().
> + */
> BUG_ON(page->mapping);
> + BUG_ON(PageSwapCache(page));
> + BUG_ON(PageAnon(page));
> + BUG_ON(PageLRU(page));
> ret = add_to_page_cache(page, &swapper_space, entry.val, GFP_KERNEL);
> if (unlikely(ret)) {
> unlock_page(page);


the good thing is that I believe this fix will make it work with the -mm
writeback changes. However this fix now collides with anon-vma since
swapsuspend passes compound pages to rw_swap_page_sync and
add_to_page_cache overwrites page->private and the kernel crashes at the
next page_cache_get() since page->private is now the swap entry and not
a page_t pointer. So I guess I've a good reason now to giveup trying to
add the page to the swapcache, and to just fake the radix tree like I
did in my original fix. That way the page won't be swapcache either so I
don't even need to use get_page to avoid remove_exclusive_swap_page to
mess with it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/