Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

From: Michal Hocko
Date: Fri Jan 08 2021 - 03:44:35 EST


On Thu 07-01-21 23:11:22, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Thu 07-01-21 20:59:33, Muchun Song wrote:
> > > On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > [...]
> > > > Right. Can we simply back off in the dissolving path when ref count is
> > > > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> > > > when the all above is true and the page is not being freed?
> > >
> > > The list_empty(&page->lru) may always return false.
> > > The page before freeing is on the active list
> > > (hstate->hugepage_activelist).Then it is on the free list
> > > after freeing. So list_empty(&page->lru) is always false.
> >
> > The point I was trying to make is that the page has to be enqueued when
> > it is dissolved and freed. If the page is not enqueued then something
> > racing. But then I have realized that this is not a great check to
> > detect the race because pages are going to be released to buddy
> > allocator and that will reuse page->lru again. So scratch that and sorry
> > for the detour.
> >
> > But that made me think some more and one way to reliably detect the race
> > should be PageHuge() check in the freeing path. This is what dissolve
> > path does already. PageHuge becomes false during update_and_free_page()
> > while holding the hugetlb_lock. So can we use that?
>
> It may make the thing complex. Apart from freeing it to the
> buddy allocator, free_huge_page also does something else for
> us. If we detect the race in the freeing path, if it is not a HugeTLB
> page, the freeing path just returns. We also should move those
> things to the dissolve path. Right?

Not sure what you mean. Dissolving is a subset of the freeing path. It
effectivelly only implements the update_and_free_page branch (aka free
to buddy). It skips some of the existing steps because it believes it
sees a freed page. But as you have pointed out this is racy and I
strongly suspect it is simply wrong in those assumptions. E.g. hugetlb
cgroup accounting can get wrong right?

The more I think about it the more I think that dissolving path should
simply have a common helper with __free_huge_page that would release
the huge page to the allocator. The only thing that should be specific
to the dissolving path is HWpoison handling. It shouldn't be playing
with accounting and what not. Btw the HWpoison handling is suspicious as
well, a lost race would mean this doesn't happen. But maybe there is
some fixup handled later on...

> But I find a tricky problem to solve. See free_huge_page().
> If we are in non-task context, we should schedule a work
> to free the page. We reuse the page->mapping. If the page
> is already freed by the dissolve path. We should not touch
> the page->mapping. So we need to check PageHuge().
> The check and llist_add() should be protected by
> hugetlb_lock. But we cannot do that. Right? If dissolve
> happens after it is linked to the list. We also should
> remove it from the list (hpage_freelist). It seems to make
> the thing more complex.

I am not sure I follow you here but yes PageHuge under hugetlb_lock
should be the reliable way to check for the race. I am not sure why we
really need to care about mapping or other state.
--
Michal Hocko
SUSE Labs