Re: [PATCH v5 1/2] mm,hwpoison: fix race with hugetlb page allocation

From: HORIGUCHI NAOYA(堀口 直也)
Date: Tue May 25 2021 - 09:08:18 EST


On Tue, May 25, 2021 at 11:09:18AM +0200, Oscar Salvador wrote:
> On Tue, May 25, 2021 at 08:07:07AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > OK, here's the current draft.
> >
> > Thanks,
> > Naoya Horiguchi
> >
> > ---
> > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> > Date: Tue, 18 May 2021 23:49:18 +0900
> > Subject: [PATCH] mm,hwpoison: fix race with hugetlb page allocation
> >
> > When hugetlb page fault (under overcommitting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> >
> > CPU0: CPU1:
> >
> > gather_surplus_pages()
> > page = alloc_surplus_huge_page()
> > memory_failure_hugetlb()
> > get_hwpoison_page(page)
> > __get_hwpoison_page(page)
> > get_page_unless_zero(page)
> > zero = put_page_testzero(page)
> > VM_BUG_ON_PAGE(!zero, page)
> > enqueue_huge_page(h, page)
> > put_page(page)
> >
> > __get_hwpoison_page() only checks the page refcount before taking an
> > additional one for memory error handling, which is wrong because there's
> > a time window where compound pages have non-zero refcount during
> > initialization. So make __get_hwpoison_page() check page status a bit
> > more for hugetlb pages.
>
> I think that this changelog would benefit from some information about the new
> !PageLRU && !__PageMovable check.

OK, I'll update about this check.

> > static int __get_hwpoison_page(struct page *page)
> > {
> > struct page *head = compound_head(page);
> > + int ret = 0;
> > + bool hugetlb = false;
> > +
> > + ret = get_hwpoison_huge_page(head, &hugetlb);
> > + if (hugetlb)
> > + return ret;
> > +
> > + if (!PageLRU(head) && !__PageMovable(head))
> > + return 0;
>
> This definitely needs a comment hinting the reader why we need to check for this.
> AFAICS, this is to close the race where a page is about to be a hugetlb page soon,
> so we do not go for get_page_unless_zero(), right?

Right, I can't find any other reliable check to show that a page is not
under initialization of hugetlb pages. I'll state why this check is needed.

>
> From soft_offline_page's POV I __guess__ that's fine because we only deal with
> pages we know about.
> But what about memory_failure()? I think memory_failure() is less picky about that,
> so it is okay to not take a refcount on that case?

Actually the coverage of page types for which memory error can be handled
are the same between memory_failure() and soft_offline_page(). One
memory_failure-specific role is to print out logs about error events even if
they can't be successfully handled, which could be affected by this change,
so we might need to adjust how memory_failure() uses the return value of
get_hwpoison_page().

Thanks,
Naoya Horiguchi