Re: [PATCH v7 5/8] KVM: x86/mmu: Don't pass FOLL_GET to __kvm_follow_pfn

From: David Stevens
Date: Thu Aug 24 2023 - 21:40:08 EST


On Fri, Aug 25, 2023 at 12:15 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Aug 24, 2023, David Stevens wrote:
> > On Wed, Jul 5, 2023 at 7:25 PM Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Jul 04, 2023 at 04:50:50PM +0900, David Stevens wrote:
> > > > @@ -4529,7 +4540,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> > > >
> > > > out_unlock:
> > > > read_unlock(&vcpu->kvm->mmu_lock);
> > > > - kvm_release_pfn_clean(fault->pfn);
> > >
> > > Yet kvm_release_pfn() can still be triggered for the kvm_vcpu_maped gfns.
> > > What if guest uses a non-referenced page(e.g., as a vmcs12)? Although I
> > > believe this is not gonna happen in real world...
> >
> > kvm_vcpu_map still uses gfn_to_pfn, which eventually passes FOLL_GET
> > to __kvm_follow_pfn. So if a guest tries to use a non-refcounted page
> > like that, then kvm_vcpu_map will fail and the guest will probably
> > crash. It won't trigger any bugs in the host, though.
> >
> > It is unfortunate that the guest will be able to use certain types of
> > memory for some purposes but not for others. However, while it is
> > theoretically fixable, it's an unreasonable amount of work for
> > something that, as you say, nobody really cares about in practice [1].
> >
> > [1] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@xxxxxxxxxx/
>
> There are use cases that care, which is why I suggested allow_unsafe_kmap.
> Specifically, AWS manages their pool of guest memory in userspace and maps it all
> via /dev/mem. Without that module param to let userspace opt-in, this series will
> break such setups. It still arguably is a breaking change since it requires
> userspace to opt-in, but allowing such behavior by default is simply not a viable
> option, and I don't have much sympathy since so much of this mess has its origins
> in commit e45adf665a53 ("KVM: Introduce a new guest mapping API").
>
> The use cases that no one cares about (AFAIK) is allowing _untrusted_ userspace
> to back guest RAM with arbitrary memory. In other words, I want KVM to allow
> (by default) mapping device memory into the guest for things like vGPUs, without
> having to do the massive and invasive overhaul needed to safely allow backing guest
> RAM with completely arbitrary memory.

Do you specifically want the allow_unsafe_kmap breaking change? v7 of
this series should have supported everything that is currently
supported by KVM, but you're right that the v8 version of
hva_to_pfn_remapped doesn't support mapping
!kvm_pfn_to_refcounted_page() pages. That could be supported
explicitly with allow_unsafe_kmap as you suggested, or it could be
supported implicitly with this implementation:

@@ -2774,8 +2771,14 @@ static int hva_to_pfn_remapped(struct
vm_area_struct *vma,
* would then underflow the refcount when the caller does the
* required put_page. Don't allow those pages here.
*/
- if (!kvm_try_get_pfn(pfn))
- r = -EFAULT;
+ page = kvm_pfn_to_refcounted_page(pfn);
+ if (page) {
+ if (get_page_unless_zero(page))
+ WARN_ON_ONCE(kvm_follow_refcounted_pfn(foll,
page) != pfn);
+
+ if (!foll->is_refcounted_page && !foll->guarded_by_mmu_notifier)
+ r = -EFAULT;
+ }

-David