Re: [External] Re: [PATCH] mm: hugetlb: fix a race between memory-failure/soft_offline and gather_surplus_pages

From: Oscar Salvador
Date: Wed Apr 21 2021 - 04:21:11 EST

On Wed, Apr 21, 2021 at 04:15:00PM +0800, Muchun Song wrote:
> > The hwpoison side of this looks really suspicious to me. It shouldn't
> > really touch the reference count of hugetlb pages without being very
> > careful (and having hugetlb_lock held). What would happen if the
> > reference count was increased after the page has been enqueed into the
> > pool? This can just blow up later.
> If the page has been enqueued into the pool, then the page can be
> allocated to other users. The page reference count will be reset to
> 1 in the dequeue_huge_page_node_exact(). Then memory-failure
> will free the page because of put_page(). This is wrong. Because
> there is another user.

Note that dequeue_huge_page_node_exact() will not hand over any pages
which are poisoned, so in this case it will not be allocated.
But it is true that we might need hugetlb lock, this needs some more

I will have a look.

Oscar Salvador