Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages
From: Sean Christopherson
Date: Tue Apr 20 2021 - 13:14:03 EST
On Tue, Apr 20, 2021, Kirill A. Shutemov wrote:
> On Mon, Apr 19, 2021 at 08:09:13PM +0000, Sean Christopherson wrote:
> > On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > > The critical question is whether we ever need to translate hva->pfn after
> > > the page is added to the guest private memory. I believe we do, but I
> > > never checked. And that's the reason we need to keep hwpoison entries
> > > around, which encode pfn.
> >
> > As proposed in the TDX RFC, KVM would "need" the hva->pfn translation if the
> > guest private EPT entry was zapped, e.g. by NUMA balancing (which will fail on
> > the backend). But in that case, KVM still has the original PFN, the "new"
> > translation becomes a sanity check to make sure that the zapped translation
> > wasn't moved unexpectedly.
> >
> > Regardless, I don't see what that has to do with kvm_pfn_map. At some point,
> > gup() has to fault in the page or look at the host PTE value. For the latter,
> > at least on x86, we can throw info into the PTE itself to tag it as guest-only.
> > No matter what implementation we settle on, I think we've failed if we end up in
> > a situation where the primary MMU has pages it doesn't know are guest-only.
>
> I try to understand if it's a problem if KVM sees a guest-only PTE, but
> it's for other VM. Like two VM's try to use the same tmpfs file as guest
> memory. We cannot insert the pfn into two TD/SEV guest at once, but can it
> cause other problems? I'm not sure.
For TDX and SNP, "firmware" will prevent assigning the same PFN to multiple VMs.
For SEV and SEV-ES, the PSP (what I'm calling "firmware") will not prevent
assigning the same page to multiple guests. But the failure mode in that case,
assuming the guests have different ASIDs, is limited to corruption of the guest.
On the other hand, for SEV/SEV-ES it's not invalid to assign the same ASID to
multiple guests (there's an in-flight patch to do exactly that[*]), and sharing
PFNs between guests with the same ASID would also be valid. In other words, if
we want to enforce PFN association in the kernel, I think the association should
be per-ASID, not per-KVM guest.
So, I don't think we _need_ to rely on the TDX/SNP behavior, but if leveraging
firmware to handle those checks means avoiding additional complexity in the
kernel, then I think it's worth leaning on firmware even if it means SEV/SEV-ES
don't enjoy the same level of robustness.
[*] https://lkml.kernel.org/r/20210408223214.2582277-1-natet@xxxxxxxxxx