On Fri, Aug 16, 2024 at 11:30:31AM +0200, David Hildenbrand wrote:
On 14.08.24 15:05, Jason Gunthorpe wrote:
On Fri, Aug 09, 2024 at 07:25:36PM +0200, David Hildenbrand wrote:
That is in general not what we want, and we still have some places that
wrongly hard-code that behavior.
In a MAP_PRIVATE mapping you might have anon pages that we can happily walk.
vm_normal_page() / vm_normal_page_pmd() [and as commented as a TODO,
vm_normal_page_pud()] should be able to identify PFN maps and reject them,
no?
Yep, I think we can also rely on special bit.
It is more than just relying on the special bit..
VM_PFNMAP/VM_MIXEDMAP should really only be used inside
vm_normal_page() because thay are, effectively, support for a limited
emulation of the special bit on arches that don't have them. There are
a bunch of weird rules that are used to try and make that work
properly that have to be followed.
On arches with the sepcial bit they should possibly never be checked
since the special bit does everything you need.
Arguably any place reading those flags out side of vm_normal_page/etc
is suspect.
IIUC, your opinion matches mine: VM_PFNMAP/VM_MIXEDMAP and pte_special()/...
usage should be limited to vm_normal_page/vm_normal_page_pmd/ ... of course,
GUP-fast is special (one of the reason for "pte_special()" and friends after
all).
The issue is at least GUP currently doesn't work with pfnmaps, while
there're potentially users who wants to be able to work on both page +
!page use cases. Besides access_process_vm(), KVM also uses similar thing,
and maybe more; these all seem to be valid use case of reference the vma
flags for PFNMAP and such, so they can identify "it's pfnmap" or more
generic issues like "permission check error on pgtable".
The whole private mapping thing definitely made it complicated.