Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages

From: Sean Christopherson
Date: Wed May 26 2021 - 15:47:03 EST


On Fri, May 21, 2021, Kirill A. Shutemov wrote:
> Hi Sean,
>
> The core patch of the approach we've discussed before is below. It
> introduce a new page type with the required semantics.
>
> The full patchset can be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-guest-only
>
> but only the patch below is relevant for TDX. QEMU patch is attached.

Can you post the whole series? The KVM behavior and usage of FOLL_GUEST is very
relevant to TDX.

> CONFIG_HAVE_KVM_PROTECTED_MEMORY has to be changed to what is appropriate
> for TDX and FOLL_GUEST has to be used in hva_to_pfn_slow() when running
> TDX guest.

This behavior in particular is relevant; KVM should provide FOLL_GUEST iff the
access is private or the VM type doesn't differentiate between private and
shared.

> When page get inserted into private sept we must make sure it is
> PageGuest() or SIGBUS otherwise.

More KVM feedback :-)

Ideally, KVM will synchronously exit to userspace with detailed information on
the bad behavior, not do SIGBUS. Hopefully that infrastructure will be in place
sooner than later.

https://lkml.kernel.org/r/YKxJLcg/WomPE422@xxxxxxxxxx

> Inserting PageGuest() into shared is fine, but the page will not be accessible
> from userspace.

Even if it can be functionally fine, I don't think we want to allow KVM to map
PageGuest() as shared memory. The only reason to map memory shared is to share
it with something, e.g. the host, that doesn't have access to private memory, so
I can't envision a use case.

On the KVM side, it's trivially easy to omit FOLL_GUEST for shared memory, while
always passing FOLL_GUEST would require manually zapping. Manual zapping isn't
a big deal, but I do think it can be avoided if userspace must either remap the
hva or define a new KVM memslot (new gpa->hva), both of which will automatically
zap any existing translations.

Aha, thought of a concrete problem. If KVM maps PageGuest() into shared memory,
then KVM must ensure that the page is not mapped private via a different hva/gpa,
and is not mapped _any_ other guest because the TDX-Module's 1:1 PFN:TD+GPA
enforcement only applies to private memory. The explicit "VM_WRITE | VM_SHARED"
requirement below makes me think this wouldn't be prevented.

Oh, and the other nicety is that I think it would avoid having to explicitly
handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all
memory exposed to KVM must be !PageGuest(), then it is also eligible for
copy_{to,from}_user().

> Any feedback is welcome.
>
> -------------------------------8<-------------------------------------------
>
> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> Date: Fri, 16 Apr 2021 01:30:48 +0300
> Subject: [PATCH] mm: Introduce guest-only pages
>
> PageGuest() pages are only allowed to be used as guest memory. Userspace
> is not allowed read from or write to such pages.
>
> On page fault, PageGuest() pages produce PROT_NONE page table entries.
> Read or write there will trigger SIGBUS. Access to such pages via
> syscall leads to -EIO.
>
> The new mprotect(2) flag PROT_GUEST translates to VM_GUEST. Any page
> fault to VM_GUEST VMA produces PageGuest() page.
>
> Only shared tmpfs/shmem mappings are supported.

Is limiting this to tmpfs/shmem only for the PoC/RFC, or is it also expected to
be the long-term behavior?

> GUP normally fails on such pages. KVM will use the new FOLL_GUEST flag
> to access them.