TTM huge page-faults WAS: Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit

From: Thomas HellstrÃm (VMware)
Date: Wed Sep 11 2019 - 06:10:29 EST

removing people that are probably not interested from CC
adding dri-devel

On 9/11/19 11:08 AM, Koenig, Christian wrote:
Am 10.09.19 um 21:26 schrieb Thomas HellstrÃm (VMware):
On 9/10/19 6:11 PM, Andy Lutomirski wrote:
On Sep 5, 2019, at 8:24 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx>

On Thu, Sep 05, 2019 at 05:21:24PM +0200, Thomas HellstrÃm (VMware)
On 9/5/19 4:15 PM, Dave Hansen wrote:
Hi Thomas,

Thanks for the second batch of patches! These look much improved
on all
Yes, although the TTM functionality isn't in yet. Hopefully we
won't have to
bother you with those though, since this assumes TTM will be using
the dma
Please take a look at dma_mmap_prepare and dma_mmap_fault in this

they should allow to fault dma api pages in the page fault handler.
this is totally hot off the press and not actually tested for the last
few patches. Note that I've also included your two patches from this
series to handle SEV.
I read that patch, and it seems like youâve built in the assumption
that all pages in the mapping use identical protection or, if not,
that the same fake vma hack that TTM already has is used to fudge
around it. Could it be reworked slightly to avoid this?

I wonder if itâs a mistake to put the encryption bits in vm_page_prot
at all.
From my POW, the encryption bits behave quite similar in behaviour to
the caching mode bits, and they're also in vm_page_prot. They're the
reason TTM needs to modify the page protection in the fault handler in
the first place.

The problem seen in TTM is that we want to be able to change the
vm_page_prot from the fault handler, but it's problematic since we
have the mmap_sem typically only in read mode. Hence the fake vma
hack. From what I can tell it's reasonably well-behaved, since
pte_modify() skips the bits TTM updates, so mprotect() and mremap()
works OK. I think split_huge_pmd may run into trouble, but we don't
support it (yet) with TTM.
Ah! I actually ran into this while implementing huge page support for
TTM and never figured out why that doesn't work. Dropped CPU huge page
support because of this.

By incident, I got slightly sidetracked the other day and started looking at this as well. Got to the point where I figured out all the hairy alignment issues and actually got huge_fault() calls, but never implemented the handler. I think that's definitely something worth having. Not sure it will work for IO memory, though, (split_huge_pmd will just skip non-page-backed memory) but if we only support VM_SHARED (non COW) vmas there's no reason to split the huge pmds anyway. Definitely something we should have IMO.

We could probably get away with a WRITE_ONCE() update of the
vm_page_prot before taking the page table lock since

a) We're locking out all other writers.
b) We can't race with another fault to the same vma since we hold an
address space lock ("buffer object reservation")
c) When we need to update there are no valid page table entries in the
vma, since it only happens directly after mmap(), or after an
unmap_mapping_range() with the same address space lock. When another
reader (for example split_huge_pmd()) sees a valid page table entry,
it also sees the new page protection and things are fine.
Yeah, that's exactly why I always wondered why we need this hack with
the vma copy on the stack.

But that would really be a special case. To solve this properly we'd
probably need an additional lock to protect the vm_flags and
vm_page_prot, taken after mmap_sem and i_mmap_lock.
Well we already have a special lock for this: The reservation object. So
memory barriers etc should be in place and I also think we can just
update the vm_page_prot on the fly.

I agree. This is needed for huge pages. We should make this change, and perhaps add the justification above as a comment.