Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type

From: Jason Gunthorpe
Date: Tue Mar 15 2022 - 13:18:32 EST


On Tue, Mar 15, 2022 at 12:04:35PM -0400, Matthew Rosato wrote:

> > You can't pin/unpin in this path, there is no real way to handle error
> > and ulimit stuff here, plus it is really slow. I didn't notice any of
> > this in your patches, so what do you mean by 'pin' above?
>
> patch 18 does some symbol_get for gfn_to_page (which will drive hva_to_pfn
> under the covers) and kvm_release_pfn_dirty and uses those symbols for
> pin/unpin.

To be very clear, this is quite wrong.

It does not account for the memory pinned by the guest and a hostile
VM could pin more memory than the KVM process is allowed to - which is
a security hole.

It also uses the wrong kind of pin, DMA pinned pages must be
pin_user_page'd not get_page'd and undone with unpin_user_page() and
not put_page(). This allows the guest to pin ZONE_MOVABALE memory and
other things which cannot be DMA'd to which will break the hypervisor
kernel. See David Hildenbrand's various series on COW bugs for an
overview why this is really security bad.

If you want to do dynamic pinning that is a different thing and
requires more CPU work in the shadowing operations. The modeling would
be similar except that the 1st stage iommu_domain would be this
'shared with KVM' domain people have been talking about - ie the page
table is not set with type 1 map/unmap but follows the KVM page table and
here it would be appropriate to use gfn_to_page/etc to access it.

However, if you do that then you do still have to take care of the
ulimit checks and you must teach kvm to use unpin_user_page/_dirty()
and related to be correct. This looks like a pretty big deal.

My advice is to start with the fully pinned case I described and
consider a KVM approach down the road.

[Also, I'm quite excited by this series you have, I think it shows
exactly how to fix POWER to work within the modern iommu framework,
they have the same basic problem]

Jason