Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY

From: Mina Almasry
Date: Thu May 20 2021 - 22:06:09 EST


On Thu, May 20, 2021 at 1:31 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>
> On Thu, May 20, 2021 at 1:00 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> >
> > On 5/20/21 12:21 PM, Mina Almasry wrote:
> > > On Thu, May 20, 2021 at 12:18 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
> > >>
> > >> On Thu, May 13, 2021 at 5:14 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> > >>>
> > >>> How about this approach?
> > >>> - Keep the check for hugetlbfs_pagecache_present in hugetlb_mcopy_atomic_pte
> > >>> that you added. That will catch the race where the page was added to
> > >>> the cache before entering the routine.
> > >>> - With the above check in place, we only need to worry about the case
> > >>> where copy_huge_page_from_user fails and we must drop locks. In this
> > >>> case we:
> > >>> - Free the page previously allocated.
> > >>> - Allocate a 'temporary' huge page without consuming reserves. I'm
> > >>> thinking of something similar to page migration.
> > >>> - Drop the locks and let the copy_huge_page_from_user be done to the
> > >>> temporary page.
> > >>> - When reentering hugetlb_mcopy_atomic_pte after dropping locks (the
> > >>> *pagep case) we need to once again check
> > >>> hugetlbfs_pagecache_present.
> > >>> - We then try to allocate the huge page which will consume the
> > >>> reserve. If successful, copy contents of temporary page to newly
> > >>> allocated page. Free temporary page.
> > >>>
> > >>> There may be issues with this, and I have not given it deep thought. It
> > >>> does abuse the temporary huge page concept, but perhaps no more than
> > >>> page migration. Things do slow down if the extra page allocation and
> > >>> copy is required, but that would only be the case if copy_huge_page_from_user
> > >>> needs to be done without locks. Not sure, but hoping that is rare.
> > >>
> > >> Just following up this a bit: I've implemented this approach locally,
> > >> and with it it's passing the test as-is. When I hack the code such
> > >> that the copy in hugetlb_mcopy_atomic_pte() always fails, I run into
> > >> this edge case, which causes resv_huge_pages to underflow again (this
> > >> time permemantly):
> > >>
> > >> - hugetlb_no_page() is called on an index and a page is allocated and
> > >> inserted into the cache consuming the reservation.
> > >> - remove_huge_page() is called on this index and the page is removed from cache.
> > >> - hugetlb_mcopy_atomic_pte() is called on this index, we do not find
> > >> the page in the cache and we trigger this code patch and the copy
> > >> fails.
> > >> - The allocations in this code path seem to double consume the
> > >> reservation and resv_huge_pages underflows.
> > >>
> > >> I'm looking at this edge case to understand why a prior
> > >> remove_huge_page() causes my code to underflow resv_huge_pages.
> > >>
> > >
> > > I should also mention, without a prior remove_huge_page() this code
> > > path works fine, so it seems the fact that the reservation is consumed
> > > before causes trouble, but I'm not sure why yet.
> > >
> >
> > Hi Mina,
> >
> > How about quickly posting the code? I may be able to provide more
> > suggestions if I can see the actual code.
>
> Sure thing, attached my patch so far. It's quite messy with prints
> everywhere and VM_BUG_ON() in error paths that I'm not handling yet.
> I've also hacked the code so that the hugetlb_mcopy_atomic_pte() copy
> always fails so I exercise that code path.
>

Of course right after I send out my patch, I figure out what's wrong.
It turns out freeing the allocated page when the copy fails is
extremely complicated (or complicated to figure out why it's broken).
Turns out I need to:

restore_page_on_error()
if (!HPageRestoreReserves(page)) {
hugetlb_unreserve_pages(mapping, idx, idx + 1, 1);
}
put_page(page);

This is because even though we always allocate asking for a page from
the reserves, the page may not come from the reserves if the VM
doesn't have reservation for this index (which is the case if the page
has been allocated by hugetlb_no_page() and then removed with
remove_huge_page()). So, we need to correctly handle both cases.

Sorry for the noise but this was hard to track down. Patch should be
incoming soon unless I run into other issues with a closer look.

> > --
> > Mike Kravetz