Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback
From: Hugh Dickins
Date: Fri Oct 09 2020 - 18:23:26 EST
On Fri, 9 Oct 2020, Mike Kravetz wrote:
> On 10/8/20 10:50 PM, Hugh Dickins wrote:
> >
> > It's a problem I've faced before in tmpfs, keeping a hold on the
> > mapping while page lock is dropped. Quite awkward: igrab() looks as
> > if it's the right thing to use, but turns out to give no protection
> > against umount. Last time around, I ended up with a stop_eviction
> > count in the shmem inode, which shmem_evict_inode() waits on if
> > necessary. Something like that could be done for hugetlbfs too,
> > but I'd prefer to do it without adding extra, if there is a way.
>
> Thanks.
I failed to come up with anything neater than a stop_eviction count
in the hugetlbfs inode.
But that's not unlike a very special purpose rwsem added into the
hugetlbfs inode: and now that you're reconsidering how i_mmap_rwsem
got repurposed, perhaps you will end up with an rwsem of your own in
the hugetlbfs inode.
So I won't distract you with a stop_eviction patch unless you ask:
that's easy, what you're looking into is hard - good luck!
Hugh
> >>
> >> As mentioned above, I hope all this can be removed.
> >
> > If you continue to nest page lock inside i_mmap_rwsem for hugetlbfs,
> > then I think that part of hugetlb_page_mapping_lock_write() has to
> > remain. I'd much prefer that hugetlbfs did not reverse the usual
> > nesting, but accept that you had reasons for doing it that way.
>
> Yes, that is necessary with the change to lock order.
>
> Yesterday I found another issue/case to handle in the hugetlb COW code
> caused by the difference in lock nesting. As a result, I am rethinking
> the decision to change the locking order.
>
> The primary reason for changing the lock order was to make the hugetlb
> page fault code not have to worry about pte pointers changing. The issue
> is that the pte pointer you get (or create) while walking the table
> without the page table lock could go away due to unsharing the PMD. We
> can walk the table again after acquiring the lock and validate and possibly
> retry. However, the backout code in this area which previously had to
> deal with truncation as well, was quite fragile and did not work in all
> corner cases. This was mostly due to lovely huge page reservations.
> I thought that adding more complexity to the backout code was going to
> cause more issues. Changing the locking order eliminated the pte pointer
> race as well as truncation. However, it created new locking issues. :(
>
> In parallel to working the locking issue, I am also exploring enhanced
> backout code to handle all the needed cases.
> --
> Mike Kravetz