Re: [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM

From: Matthew Rosato
Date: Tue Mar 15 2022 - 12:39:41 EST


On 3/14/22 5:46 PM, Jason Gunthorpe wrote:
On Mon, Mar 14, 2022 at 03:44:41PM -0400, Matthew Rosato wrote:
+int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
+{
+ struct vfio_device *vdev;
+ struct pci_dev *pdev;
+ int rc;
+
+ rc = kvm_s390_pci_dev_open(zdev);
+ if (rc)
+ return rc;
+
+ pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
+ if (!pdev) {
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ vdev = get_vdev(&pdev->dev);
+ if (!vdev) {
+ pci_dev_put(pdev);
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ zdev->kzdev->nb.notifier_call = kvm_s390_pci_group_notifier;
+
+ /*
+ * At this point, a KVM should already be associated with this device,
+ * so registering the notifier now should immediately trigger the
+ * event. We also want to know if the KVM association is later removed
+ * to ensure proper cleanup happens.
+ */
+ rc = register_notifier(vdev->dev, &zdev->kzdev->nb);
+
+ put_vdev(vdev);
+ pci_dev_put(pdev);
+
+ /* Make sure the registered KVM matches the KVM issuing the ioctl */
+ if (rc || zdev->kzdev->kvm != kvm) {
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ /* Must support KVM-managed IOMMU to proceed */
+ if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+ rc = zpci_iommu_attach_kvm(zdev, kvm);
+ else
+ rc = -EINVAL;

This seems like kind of a strange API, shouldn't kvm be getting a
reference on the underlying iommu_domain and then calling into it to
get the mapping table instead of pushing KVM specific logic into the
iommu driver?

I would be nice if all the special kvm stuff could more isolated in
kvm code.

I'm still a little unclear about why this is so complicated - can't
you get the iommu_domain from the group FD directly in KVM code as
power does?

Yeah, I think I could do something like that using the vfio group fd like power does.

Providing a reference to the kvm itself inside iommu was being used for the pin/unpin operations, which would not be necessary if we switched to the 1st layer iommu pinning all of guest memory.