Re: [PATCH] mm: page_alloc: fix memcg accounting leak in speculative cache lookup

From: Matthew Wilcox
Date: Fri Mar 19 2021 - 23:27:30 EST


On Fri, Mar 19, 2021 at 06:52:58PM -0700, Hugh Dickins wrote:
> > + /*
> > + * Drop the base reference from __alloc_pages and free. In
> > + * case there is an outstanding speculative reference, from
> > + * e.g. the page cache, it will put and free the page later.
> > + */
> > + if (likely(put_page_testzero(page))) {
> > free_the_page(page, order);
> > - else if (!PageHead(page))
> > + return;
> > + }
> > +
> > + /*
> > + * The speculative reference will put and free the page.
> > + *
> > + * However, if the speculation was into a higher-order page
> > + * chunk that isn't marked compound, the other side will know
> > + * nothing about our buddy pages and only free the order-0
> > + * page at the start of our chunk! We must split off and free
> > + * the buddy pages here.
> > + *
> > + * The buddy pages aren't individually refcounted, so they
> > + * can't have any pending speculative references themselves.
> > + */
> > + if (!PageHead(page) && order > 0) {
>
> The put_page_testzero() has released our reference to the first
> subpage of page: it's now under the control of the racing speculative
> lookup. So it seems to me unsafe to be checking PageHead(page) here:
> if it was actually a compound page, PageHead might already be cleared
> by now, and we doubly free its tail pages below? I think we need to
> use a "bool compound = PageHead(page)" on entry to __free_pages().
>
> Or alternatively, it's wrong to call __free_pages() on a compound
> page anyway, so we should not check PageHead at all, except in a
> WARN_ON_ONCE(PageCompound(page)) at the start?

Alas ...

$ git grep '__free_pages\>.*compound'
drivers/dma-buf/heaps/system_heap.c: __free_pages(page, compound_order(page));
drivers/dma-buf/heaps/system_heap.c: __free_pages(p, compound_order(p));
drivers/dma-buf/heaps/system_heap.c: __free_pages(page, compound_order(page));
mm/huge_memory.c: __free_pages(zero_page, compound_order(zero_page));
mm/huge_memory.c: __free_pages(zero_page, compound_order(zero_page));
mm/slub.c: __free_pages(page, compound_order(page));

Maybe we should disallow it!

There are a few other places to check:

$ grep -l __GFP_COMP $(git grep -lw __free_pages) | wc -l
24

(assuming the pages are allocated and freed in the same file, which is a
reasonable approximation, but not guaranteed to catch everything. Many
of these 24 will be false positives, of course.)