Re: [PATCH v2] mm/gup: Allow real explicit breaking of COW

From: Linus Torvalds
Date: Mon Aug 10 2020 - 12:47:47 EST


On Mon, Aug 10, 2020 at 7:57 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> One solution is actually already mentioned in commit 17839856fd58, which is to
> provide an explicit BREAK_COW scemantics for enforced COW. Then we can still
> use FAULT_FLAG_WRITE to identify whether this is a "real write request" or an
> "enfornced COW (read) request".

I think the patch is fine, but during discussions we also discussed
having the flag the other way around, in order to have the default be
"break COW", and the use cases that explicitly know they can handle
the ambiguity would have to say so explicitly with a "don't break COW"
bit.

I don't think this matters in theory, but in practice I think it would
be a good thing as documentation.

Because FAULT_FLAG_BREAK_COW doesn't really tell you anything:
breaking COW is "always safe".

In contrast, a "FAULT_FLAG_DONT_COW" bit could be accompanied with a
note like "I don't care which side I get - I'm not going to keep track
of the page, I just want random data, and it's ok if I get it from
another forked process".

In fact, maybe it shouldn't be called "DONT_COW", but more along the
lines of those semantics of "READ_WRONG_SIDE_COW_OK", so that people
who use the bit have to _think_ about it.

I think comments in general should be there.

Looking at your patch, for example, I go "Hmm" when I see this part:

- if (userfaultfd_pte_wp(vma, *vmf->pte)) {
+ if ((vmf->flags & FAULT_FLAG_WRITE) &&
+ userfaultfd_pte_wp(vma, *vmf->pte)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
return handle_userfault(vmf, VM_UFFD_WP);
}

and I go "ok, for reads with COW breaking, we'll just silently break
the COW and not do VM_UFFD_WP?"

An explanation of why that is the right thing to do would be good. And
no, I don't mean "UFFD doesn't want a WP fault in this case". I mean
literally why "we do want the silent COW, but UFFD doesn't care about
it".

End result: I think the patch is fine, but the reason we had
discussion about it and the reason it wasn't done initially was that
you get all these kinds of subtle behavior differences, and I think
they need clarifying.

Linus