Re: [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare

From: Peter Xu
Date: Wed Nov 30 2022 - 11:24:40 EST


On Wed, Nov 30, 2022 at 10:46:24AM +0100, David Hildenbrand wrote:
> > huge_pte_offset() is always called with mmap lock held with either read or
> > write. It was assumed to be safe but it's actually not. One race
> > condition can easily trigger by: (1) firstly trigger pmd share on a memory
> > range, (2) do huge_pte_offset() on the range, then at the meantime, (3)
> > another thread unshare the pmd range, and the pgtable page is prone to lost
> > if the other shared process wants to free it completely (by either munmap
> > or exit mm).
>
> So just that I understand correctly:
>
> Two processes, #A and #B, share a page table. Process #A runs two threads,
> #A1 and #A2.
>
> #A1 walks that shared page table (using huge_pte_offset()), for example, to
> resolve a page fault. Concurrently, #A2 triggers unsharing of that page
> table (replacing it by a private page table),

Not yet replacing it, just unsharing.

If the replacement happened we shouldn't trigger a bug either because
huge_pte_offset() will return the private pgtable page instead.

> for example, using munmap().

munmap() may not work because it needs mmap lock, so it'll wait until #A1
completes huge_pte_offset() walks and release mmap lock read.

Many of other things can trigger unshare, though. In the reproducer I used
MADV_DONTNEED.

>
> So #A1 will eventually read/write the shared page table while we're placing
> a private page table. Which would be fine (assuming no unsharing would be
> required by #A1), however, if #B also concurrently drops the reference to
> the shared page table (), the shared page table could essentially get freed
> while #A1 is still walking it.
>
> I suspect, looking at the reproducer, that the page table deconstructor was
> called. Will the page table also actually get freed already? IOW, could #A1
> be reading/writing a freed page?

If with the existing code base, I think it could.

If with RCU lock, it couldn't, but still since the pgtable lock is freed
even if the page is not, we'll still hit weird issues when accessing the
lock.

And with vma lock it should be all safe.

--
Peter Xu