Re: [RFC PATCH 00/42] Sharing KVM TDP to IOMMU

From: Yan Zhao
Date: Mon Dec 04 2023 - 21:21:44 EST


On Mon, Dec 04, 2023 at 11:08:00AM -0400, Jason Gunthorpe wrote:
> On Sat, Dec 02, 2023 at 05:12:11PM +0800, Yan Zhao wrote:
> > In this series, term "exported" is used in place of "shared" to avoid
> > confusion with terminology "shared EPT" in TDX.
> >
> > The framework contains 3 main objects:
> >
> > "KVM TDP FD" object - The interface of KVM to export TDP page tables.
> > With this object, KVM allows external components to
> > access a TDP page table exported by KVM.
>
> I don't know much about the internals of kvm, but why have this extra
> user visible piece? Isn't there only one "TDP" per kvm fd? Why not
> just use the KVM FD as a handle for the TDP?
As explained in a parallel mail, the reason to introduce KVM TDP FD is to let
KVM know which TDP the user wants to export(share).
And another reason is wrap the exported TDP with its exported ops in a
single structure. So, components outside of KVM can query meta data and
request page fault, register invalidate callback through the exported ops.

struct kvm_tdp_fd {
/* Public */
struct file *file;
const struct kvm_exported_tdp_ops *ops;

/* private to KVM */
struct kvm_exported_tdp *priv;
};
For KVM, it only needs to expose this struct kvm_tdp_fd and two symbols
kvm_tdp_fd_get() and kvm_tdp_fd_put().


>
> > "IOMMUFD KVM HWPT" object - A proxy connecting KVM TDP FD to IOMMU driver.
> > This HWPT has no IOAS associated.
> >
> > "KVM domain" in IOMMU driver - Stage 2 domain in IOMMU driver whose paging
> > structures are managed by KVM.
> > Its hardware TLB invalidation requests are
> > notified from KVM via IOMMUFD KVM HWPT
> > object.
>
> This seems broadly the right direction
>
> > - About device which partially supports IOPF
> >
> > Many devices claiming PCIe PRS capability actually only tolerate IOPF in
> > certain paths (e.g. DMA paths for SVM applications, but not for non-SVM
> > applications or driver data such as ring descriptors). But the PRS
> > capability doesn't include a bit to tell whether a device 100% tolerates
> > IOPF in all DMA paths.
>
> The lack of tolerance for truely DMA pinned guest memory is a
> significant problem for any real deployment, IMHO. I am aware of no
> device that can handle PRI on every single DMA path. :(
DSA actaully can handle PRI on all DMA paths. But it requires driver to turn on
this capability :(

> > A simple way is to track an allowed list of devices which are known 100%
> > IOPF-friendly in VFIO. Another option is to extend PCIe spec to allow
> > device reporting whether it fully or partially supports IOPF in the PRS
> > capability.
>
> I think we need something like this.
>
> > - How to map MSI page on arm platform demands discussions.
>
> Yes, the recurring problem :(
>
> Probably the same approach as nesting would work for a hack - map the
> ITS page into the fixed reserved slot and tell the guest not to touch
> it and to identity map it.
Ok.