Re: [PATCHv3 4/5] mm: make compound_head() robust

From: Kirill A. Shutemov
Date: Mon Aug 24 2015 - 06:18:06 EST


On Wed, Aug 19, 2015 at 12:21:45PM +0300, Kirill A. Shutemov wrote:
> Hugh has pointed that compound_head() call can be unsafe in some
> context. There's one example:
>
> CPU0 CPU1
>
> isolate_migratepages_block()
> page_count()
> compound_head()
> !!PageTail() == true
> put_page()
> tail->first_page = NULL
> head = tail->first_page
> alloc_pages(__GFP_COMP)
> prep_compound_page()
> tail->first_page = head
> __SetPageTail(p);
> !!PageTail() == true
> <head == NULL dereferencing>
>
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.
>
> We can fix the race by changing how encode PageTail() and compound_head()
> within struct page to be able to update them in one shot.
>
> The patch introduces page->compound_head into third double word block in
> front of compound_dtor and compound_order. That means it shares storage
> space with:
>
> - page->lru.next;
> - page->next;
> - page->rcu_head.next;
> - page->pmd_huge_pte;
>
> That's too long list to be absolutely sure, but looks like nobody uses
> bit 0 of the word. It can be used to encode PageTail(). And if the bit
> set, rest of the word is pointer to head page.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Acked-by: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxx>

If DEFERRED_STRUCT_PAGE_INIT=n, combining this patchset with my page-flags
patches causes oops in SetPageReserved() called from
reserve_bootmem_region().

It happens because we haven't yet initilized the word in struct page and
PageTail() inside SetPageReserved() can give false-positive, which leads
to bogus compound_head() result.

IIUC, we initialize the word only on first allocation of the page. It can
be too late: pfn scanner can see false-positive PageTail() from not yet
allocated pages too.

Here's fixlet for patch to address the issue.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 347724850665..d0e3fca830f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -892,6 +892,8 @@ static void init_reserved_page(unsigned long pfn)
#else
static inline void init_reserved_page(unsigned long pfn)
{
+ /* Avoid false-positive PageTail() */
+ INIT_LIST_HEAD(&pfn_to_page(pfn)->lru);
}
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/