Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

From: Peter Xu
Date: Tue Sep 15 2020 - 12:08:00 EST


On Tue, Sep 15, 2020 at 10:50:40AM -0400, Peter Xu wrote:
> Hi, all,
>
> I prepared another version of the FOLL_PIN enforced cow patch attached, just in
> case it would still be anything close to useful (though now I highly doubt it
> considering below...). I took care of !USERFAULTFD as suggested by Leon, and
> also the fast gup path.

Now with the patch attached (for real..).

>
> However...
>
> On Mon, Sep 14, 2020 at 08:28:51PM -0300, Jason Gunthorpe wrote:
> > Yes, this stuff does pin_user_pages_fast() and MADV_DONTFORK
> > together. It sets FOLL_FORCE and FOLL_WRITE to get an exclusive copy
> > of the page and MADV_DONTFORK was needed to ensure that a future fork
> > doesn't establish a COW that would break the DMA by moving the
> > physical page over to the fork. DMA should stay with the process that
> > called pin_user_pages_fast() (Is MADV_DONTFORK still needed with
> > recent years work to GUP/etc? It is a pretty terrible ancient thing)
>
> ... Now I'm more confused on what has happened.
>
> If we're with FORCE|WRITE, iiuc it should guarantee that the page will trigger
> COW during gup even if it is shared, so no problem on the gup side. Then I'm
> quite confused on why the write bit is not set when cow triggered.
>
> E.g., in wp_page_copy(), if I'm not wrong, the write bit is only controlled by
> (besides the fix patch, though I believe the rdma test should have nothing to
> do with uffd-wp after all so it should be the same anyways):
>
> entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>
> It means, as long as the rdma region has VM_WRITE set (which I think of no
> reason on why it shouldn't...), then it should have the write bit in the COWed
> page entry. If so, the page should be stable and I don't undersdand why
> another COW could even trigger and how the code path in the "trial cow" patch
> is triggered.
>
> Or, the VMA is without VM_WRITE due to some reason? Sorry I probably know
> nothing about RDMA, more information on that side might help too. E.g., is the
> hardware going to walk the software process page table too when doing RDMA (or
> is IOMMU page table used, or none)?
>
> Thanks,
>
> --
> Peter Xu

--
Peter Xu