Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

From: Peter Xu
Date: Thu Sep 17 2020 - 15:04:24 EST


On Thu, Sep 17, 2020 at 11:26:01AM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2020 at 11:14 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
> >
> > In my humble opinion, the real solution is still to use MADV_DONTFORK properly
> > so we should never share the DMA pages with others when we know the fact.
>
> Is this all just because somebody does a fork() after doing page pinning?
>
> If so, I feel this should be trivially fixed in copy_one_pte().
> That's where we currently do
>
> /*
> * If it's a COW mapping, write protect it both
> * in the parent and the child
> */
> if (is_cow_mapping(vm_flags) && pte_write(pte)) {
> ptep_set_wrprotect(src_mm, addr, src_pte);
> pte = pte_wrprotect(pte);
> }
>
> and I feel that that is where we could just change the code to do a
> COW event for pinned pages (and *not* mark the parent write protected,
> since the parent page now isn't a COW page).
>
> Because if that's the case that Jason is hitting, then I feel that
> really is the correct fix: make sure that the pinning action is
> meaningful.
>
> As mentioned, I really think the whole (and only) point of page
> pinning is that it should keep the page locked in the page tables. And
> by "locked" I mean exactly that: not just present, but writable.
>
> And then the "we never COW a pinned page" comes not from the COW code
> doing magic, but by it simply never becoming non-writable - because
> the page table entry is locked!

Looks reasonable to me.

The fork() should be slightly slower though, since we'll need to copy the data
for all the DMA buffers for each of the child processes, even if we should be
pretty sure those processes won't use these pages at all. But it seems a good
approach anyway if we care about the potential breakages in the userspace so
the breakage is turned into perf degrades, and if any userspace noticed such
perf degrade on fork() then people will probably remember to use MADV_DONTFORK
properly since afaict MADV_DONTFORK can remove this extra overhead..

Another side effect I can think of is that we'll bring some uncertainty to
fork() starting from when page_maybe_dma_pinned() is used, since it's sometimes
bogus (hpage_pincount_available()==false) so some COWs might be triggered
during fork() even when not necessary if we've got some normal pages with too
many refcounts (over GUP_PIN_COUNTING_BIAS). But assuming that's not a big
deal since it should be extremely rare, or is it?..

--
Peter Xu