I just tested it and write() works fine, it uses uaccess afaict as part of the
lib/iov_iter.c code:
generic_perform_write()
-> copy_folio_from_iter_atomic()
-> copy_page_from_iter_atomic()
-> __copy_from_iter()
-> copy_from_user_iter()
-> raw_copy_from_user()
-> copy_user_generic()
-> [uaccess asm]
Ah yes. O_DIRECT is the problematic bit I suspect, which will use GUP.
Ptrace access and friends should also no longer work on these pages, likely
that's tolerable.
Yeah Peter can interject if not, but I'd be _very_ surprised if anybody expects
to be able to ptrace perf counter mappings in another process (it'd be kind of
totally insane to do that anyway since it's a ring buffer that needs special
handing anyway).
If we can't do pfnmap, and we definitely can't do mixedmap (because it's
basically entirely equivalent in every way to just faulting in the pages as
before and requires the same hacks) then I will have to go back to the drawing
board or somehow change the faulting code.
This really sucks.
I'm not quite sure I even understand why we don't allow GUP used _just for
pinning_ on VM_PFNMAP when it is -in effect- already pinned on assumption
whatever mapped it will maintain the lifetime.
What a mess...
Because VM_PFNMAP is dangerous as hell. To get a feeling for that (and also why I
raised my refcounting comment earlier) just read recent:
commit 79a61cc3fc0466ad2b7b89618a6157785f0293b3
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Wed Sep 11 17:11:23 2024 -0700
mm: avoid leaving partial pfn mappings around in error case
As Jann points out, PFN mappings are special, because unlike normal
memory mappings, there is no lifetime information associated with the
mapping - it is just a raw mapping of PFNs with no reference counting of
a 'struct page'.
I'm _very_ aware of this, having worked extensively on things around this kind
of issue recently (was a little bit involved with the above one too :), and
explicitly zap on error in this patch to ensure we leave no broken code paths.
I agree it's horrible, but we need to have a way of mapping this memory without
having to 'trick' the faulting mechanism to behave correctly.
What's completely "surprising" to me is, if there is no page_mkwrite, but
the VMA is writable, then just map the PTE writable ...
I've had a number of surprises on this journey :)
To make sure I understand what you mean do you mean the whole:
do_wp_page()
-> wp_page_shared()
-> if (page_mkwrite) { ... } else { wp_page_reuse(); }
-> wp_page_reuse()
-> maybe_mkwrite() [hey VM_WRITE is set, so yes make writable!]
Path?
So overall - given all the above and the fact that the alternatives are _even
worse_ (having to spuriously folio lock if that'll even work, totally
unnecessary and semantically incorrect folio-fication or a complete rework of
mapping) - while you might be sicked by this use of the VM_PFNMAP - are you ok
with the patch, all things considered? :)