Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction

From: Alexander Duyck
Date: Fri Jul 17 2020 - 12:10:00 EST


On Thu, Jul 16, 2020 at 10:10 PM Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> wrote:
>
>
> >> @@ -950,6 +951,21 @@ static bool too_many_isolated(pg_data_t *pgdat)
> >> if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
> >> goto isolate_fail;
> >>
> >> + /*
> >> + * Be careful not to clear PageLRU until after we're
> >> + * sure the page is not being freed elsewhere -- the
> >> + * page release code relies on it.
> >> + */
> >> + if (unlikely(!get_page_unless_zero(page)))
> >> + goto isolate_fail;
> >> +
> >> + if (__isolate_lru_page_prepare(page, isolate_mode) != 0)
> >> + goto isolate_fail_put;
> >> +
> >> + /* Try isolate the page */
> >> + if (!TestClearPageLRU(page))
> >> + goto isolate_fail_put;
> >> +
> >> /* If we already hold the lock, we can skip some rechecking */
> >> if (!locked) {
> >> locked = compact_lock_irqsave(&pgdat->lru_lock,
> >
> > Why not do the __isolate_lru_page_prepare before getting the page?
> > That way you can avoid performing an extra atomic operation on non-LRU
> > pages.
> >
>
> This change come from Hugh Dickins as mentioned from commit log:
> >> trylock_page() is not safe to use at this time: its setting PG_locked
> >> can race with the page being freed or allocated ("Bad page"), and can
> >> also erase flags being set by one of those "sole owners" of a freshly
> >> allocated page who use non-atomic __SetPageFlag().
>
> Hi Hugh,
>
> would you like to show more details of the bug?
>
> ...
>
> >> + * sure the page is not being freed elsewhere -- the
> >> + * page release code relies on it.
> >> + */
> >> + if (unlikely(!get_page_unless_zero(page)))
> >> + goto busy;
> >> +
> >> + if (!TestClearPageLRU(page)) {
> >> + /*
> >> + * This page may in other isolation path,
> >> + * but we still hold lru_lock.
> >> + */
> >> + put_page(page);
> >> + goto busy;
> >> + }
> >> +
> >
> > I wonder if it wouldn't make sense to combine these two atomic ops
> > with tests and the put_page into a single inline function? Then it
> > could be possible to just do one check and if succeeds you do the
> > block of code below, otherwise you just fall-through into the -EBUSY
> > case.
> >
>
> Uh, since get_page changes page->_refcount, TestClearPageLRU changes page->flags,
> So I don't know how to combine them, could you make it more clear with code?

Actually it is pretty straight forward. Something like this:
static inline bool get_page_unless_zero_or_nonlru(struct page *page)
{
if (get_page_unless_zero(page)) {
if (TestClearPageLRU(page))
return true;
put_page(page);
}
return false;
}

You can then add comments as necessary. The general idea is you are
having to do this in two different spots anyway so why not combine the
logic? Although it does assume you can change the ordering of the
other test above.