Re: [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing

From: Mike Kravetz
Date: Fri Nov 08 2019 - 14:11:00 EST


On 11/7/19 6:04 PM, Davidlohr Bueso wrote:
> On Thu, 07 Nov 2019, Mike Kravetz wrote:
>
>> Note that huge_pmd_share now increments the page count with the semaphore
>> held just in read mode. It is OK to do increments in parallel without
>> synchronization. However, we don't want anyone else changing the count
>> while that check in huge_pmd_unshare is happening. Hence, the need for
>> taking the semaphore in write mode.
>
> This would be a nice addition to the changelog methinks.

Last night I remembered there is one place where we currently take
i_mmap_rwsem in read mode and potentially call huge_pmd_unshare. That
is in try_to_unmap_one. Yes, there is a potential race here today.
But that race is somewhat contained as you need two threads doing some
combination of page migration and page poisoning to race. This change
now allows migration or poisoning to race with page fault. I would
really prefer if we do not open up the race window in this manner.

Getting this right in the try_to_unmap_one case is a bit tricky. I had
code to do this in the past that was part of a bigger hugetlb synchronization
change. All those changes got reverted (commit ddeaab32a89f), but I
believe it is possible to change try_to_unmap_one calling sequences
without introducing other issues.

Bottom line is that more changes are needed in this patch. I'll work
on those changes unless someone else volunteers. It will likely take me
one or two days to come up with and test proposed changes.
--
Mike Kravetz