Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit
From: Koenig, Christian
Date: Wed Sep 11 2019 - 05:08:08 EST
Am 10.09.19 um 21:26 schrieb Thomas HellstrÃm (VMware):
> On 9/10/19 6:11 PM, Andy Lutomirski wrote:
>>> On Sep 5, 2019, at 8:24 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx>
>>>> On Thu, Sep 05, 2019 at 05:21:24PM +0200, Thomas HellstrÃm (VMware)
>>>>> On 9/5/19 4:15 PM, Dave Hansen wrote:
>>>>> Hi Thomas,
>>>>> Thanks for the second batch of patches!Â These look much improved
>>>>> on all
>>>> Yes, although the TTM functionality isn't in yet. Hopefully we
>>>> won't have to
>>>> bother you with those though, since this assumes TTM will be using
>>>> the dma
>>> Please take a look at dma_mmap_prepare and dma_mmap_fault in this
>>> they should allow to fault dma api pages in the page fault handler.Â
>>> this is totally hot off the press and not actually tested for the last
>>> few patches.Â Note that I've also included your two patches from this
>>> series to handle SEV.
>> I read that patch, and it seems like youâve built in the assumption
>> that all pages in the mapping use identical protection or, if not,
>> that the same fake vma hack that TTM already has is used to fudge
>> around it.Â Could it be reworked slightly to avoid this?
>> I wonder if itâs a mistake to put the encryption bits in vm_page_prot
>> at all.
> From my POW, the encryption bits behave quite similar in behaviour to
> the caching mode bits, and they're also in vm_page_prot. They're the
> reason TTM needs to modify the page protection in the fault handler in
> the first place.
> The problem seen in TTM is that we want to be able to change the
> vm_page_prot from the fault handler, but it's problematic since we
> have the mmap_sem typically only in read mode. Hence the fake vma
> hack. From what I can tell it's reasonably well-behaved, since
> pte_modify() skips the bits TTM updates, so mprotect() and mremap()
> works OK. I think split_huge_pmd may run into trouble, but we don't
> support it (yet) with TTM.
Ah! I actually ran into this while implementing huge page support for
TTM and never figured out why that doesn't work. Dropped CPU huge page
support because of this.
> We could probably get away with a WRITE_ONCE() update of the
> vm_page_prot before taking the page table lock since
> a) We're locking out all other writers.
> b) We can't race with another fault to the same vma since we hold an
> address space lock ("buffer object reservation")
> c) When we need to update there are no valid page table entries in the
> vma, since it only happens directly after mmap(), or after an
> unmap_mapping_range() with the same address space lock. When another
> reader (for example split_huge_pmd()) sees a valid page table entry,
> it also sees the new page protection and things are fine.
Yeah, that's exactly why I always wondered why we need this hack with
the vma copy on the stack.
> But that would really be a special case. To solve this properly we'd
> probably need an additional lock to protect the vm_flags and
> vm_page_prot, taken after mmap_sem and i_mmap_lock.
Well we already have a special lock for this: The reservation object. So
memory barriers etc should be in place and I also think we can just
update the vm_page_prot on the fly.