RE: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver

From: Liu, Yi L
Date: Wed Jul 03 2019 - 04:25:33 EST


Hi Alex,

Thanks for the comments. Have four inline responses below. And one
of them need your further help. :-)
.
> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> Sent: Friday, June 28, 2019 11:08 PM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver
>
> On Mon, 24 Jun 2019 08:20:38 +0000
> "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:
>
> > Hi Alex,
> >
> > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > > Sent: Friday, June 21, 2019 11:58 PM
> > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver
> > >
> > > On Fri, 21 Jun 2019 10:23:10 +0000
> > > "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > > > > Sent: Friday, June 21, 2019 5:08 AM
> > > > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > > > Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver
> > > > >
> > > > > On Thu, 20 Jun 2019 13:00:34 +0000 "Liu, Yi L"
> > > > > <yi.l.liu@xxxxxxxxx> wrote:
> > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > > > > > > Sent: Thursday, June 20, 2019 12:27 PM
> > > > > > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > > > > > Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci
> > > > > > > driver
> > > > > > >
> > > > > > > On Sat, 8 Jun 2019 21:21:11 +0800 Liu Yi L
> > > > > > > <yi.l.liu@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > This patch adds sample driver named vfio-mdev-pci. It is
> > > > > > > > to wrap a PCI device as a mediated device. For a pci
> > > > > > > > device, once bound to vfio-mdev-pci driver, user space
> > > > > > > > access of this device will go through vfio mdev framework.
> > > > > > > > The usage of the device follows mdev management method.
> > > > > > > > e.g. user should create a mdev before exposing the device to user-space.
> > > > [...]
> > > > > >
> > > > > > > However, the patch below just makes the mdev interface
> > > > > > > behave correctly, I can't make it work on my system because
> > > > > > > commit 7bd50f0cd2fd ("vfio/type1: Add domain at(de)taching
> > > > > > > group helpers")
> > > > > >
> > > > > > What error did you encounter. I tested the patch with a device
> > > > > > in a singleton iommu group. I'm also searching a proper
> > > > > > machine with multiple devices in an iommu group and test it.
> > > > >
> > > > > In vfio_iommu_type1, iommu backed mdev devices use the
> > > > > iommu_attach_device() interface, which includes:
> > > > >
> > > > > if (iommu_group_device_count(group) != 1)
> > > > > goto out_unlock;
> > > > >
> > > > > So it's impossible to use with non-singleton groups currently.
> > > >
> > > > Hmmm, I think it is no longer good to use iommu_attach_device()
> > > > for iommu backed mdev devices now. In this flow, the purpose here
> > > > is to attach a device to a domain and no need to check whether the
> > > > device is in a singleton iommu group. I think it would be better
> > > > to use __iommu_attach_device() instead of iommu_attach_device().
> > >
> > > That's a static and unexported, it's intentionally not an exposed
> > > interface. We can't attach devices in the same group to separate
> > > domains allocated through iommu_domain_alloc(), this would violate
> > > the iommu group isolation principles.
> >
> > Go it. :-) Then not good to expose such interface. But to support
> > devices in non-singleton iommu group, we need to have a new interface
> > which doesn't count the devices but attach all the devices.
>
> We have iommu_attach_group(), we just need to track which groups are attached.

yep.

> > > > Also I found a potential mutex lock issue if using iommu_attach_device().
> > > > In vfio_iommu_attach_group(), it uses iommu_group_for_each_dev()
> > > > to loop all the devices in the group. It holds group->mutex. And
> > > > then
> > > vfio_mdev_attach_domain()
> > > > calls iommu_attach_device() which also tries to get group->mutex.
> > > > This would be an issue. If you are fine with it, I may post
> > > > another patch for it. :-)
> > >
> > > Gack, yes, please send a patch.
> >
> > Would do it, may be together with the support of vfio-mdev-pci on
> > devices in non-singleton iommu group.
> >
> > >
> > > > > > > used iommu_attach_device() rather than iommu_attach_group()
> > > > > > > for non-aux mdev iommu_device. Is there a requirement that
> > > > > > > the mdev parent device is in a singleton iommu group?
> > > > > >
> > > > > > I don't think there should have such limitation. Per my
> > > > > > understanding, vfio-mdev-pci should also be able to bind to
> > > > > > devices which shares iommu group with other devices. vfio-pci works well
> for such devices.
> > > > > > And since the two drivers share most of the codes, I think
> > > > > > vfio-mdev-pci should naturally support it as well.
> > > > >
> > > > > Yes, the difference though is that vfio.c knows when devices are
> > > > > in the same group, which mdev vfio.c only knows about the
> > > > > non-iommu backed group, not the group that is actually used for
> > > > > the iommu backing. So we either need to enlighten vfio.c or
> > > > > further abstract those details in vfio_iommu_type1.c.
> > > >
> > > > Not sure if it is necessary to introduce more changes to vfio.c or
> > > > vfio_iommu_type1.c. If it's only for the scenario which two
> > > > devices share an iommu_group, I guess it could be supported by
> > > > using __iommu_attach_device() which has no device counting for the
> > > > group. But maybe I missed something here. It would be great if you
> > > > can elaborate a bit for it. :-)
> > >
> > > We need to use the group semantics, there's a reason
> > > __iommu_attach_device() is not exposed, it's an internal helper. I
> > > think there's no way around that we need to somewhere track the
> > > actual group we're attaching to and have the smarts to re-use it for
> > > other devices in the same group.
> >
> > Hmmm, exposing __iommu_attach_device() is not good, let's forget it.
> > :-)
> >
> > > > > > > If this is a simplification, then vfio-mdev-pci should not
> > > > > > > bind to devices where this is violated since there's no way
> > > > > > > to use the device. Can we support it though?
> > > > > >
> > > > > > yeah, I think we need to support it.

I've already made vfio-mdev-pci driver work for non-singleton iommu
group. e.g. for devices in a single iommu group, I can bind the devices
to eithervfio-pci or vfio-mdev-pci and then passthru them to a VM. And
it will fail if user tries to passthru a vfio-mdev-pci device via vfio-pci
manner "-device vfio-pci,host=01:00.1". In other words, vfio-mdev-pci
device can only passthru via
"-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID". This is what
we expect.

However, I encountered a problem when trying to prevent user from
passthru these devices to different VMs. I've tried in my side, and I
can passthru vfio-pci device and vfio-mdev-pci device to different
VMs. But actually this operation should be failed. If all the devices
are bound to vfio-pci, Qemu will open iommu backed group. So
Qemu can check if a given group has already been used by an
AddressSpace (a.ka. VM) in vfio_get_group() thus to prevent
user from passthru these devices to different VMs if the devices
are in the same iommu backed group. However, here for a
vfio-mdev-pci device, it has a new group and group ID, Qemu
will not be able to detect if the other devices (share iommu group
with vfio-mdev-pci device) are passthru to existing VMs. This is the
major problem for vfio-mdev-pci to support non-singleton group
in my side now. Even all devices are bound to vfio-mdev-pci driver,
Qemu is still unable to check since all the vfio-mdev-pci devices
have a separate mdev group.

To fix it, may need Qemu to do more things. E.g. If it tries to use a
non-singleton iommu backed group, it needs to check if any mdev
group is created and used by an existing VM. Also it needs check if
iommu backed group is passthru to an existing VM when trying to
use a mdev group. For singleton iommu backed group and
aux-domain enabled physical device, still allow to passthru mdev
group to different VMs. To achieve these checks, Qemu may need
to have knowledge whether a group is iommu backed and singleton
or not. Do you think it is good to expose such info to userspace? or
any other idea? :-)

> > > > > >
> > > > > > > If I have two devices in the same group and bind them both
> > > > > > > to vfio-mdev-pci, I end up with three groups, one for each
> > > > > > > mdev device and the original physical device group. vfio.c
> > > > > > > works with the mdev groups and will try to match both groups
> > > > > > > to the container. vfio_iommu_type1.c also works with the
> > > > > > > mdev groups, except for the point where we actually try to
> > > > > > > attach a group to a domain, which is the only window where
> > > > > > > we use the iommu_device rather than the provided group, but
> > > > > > > we don't record that anywhere. Should struct vfio_group
> > > > > > > have a pointer to a reference counted object that tracks the
> > > > > > > actual iommu_group attached, such that we can determine that the group
> is already attached to the domain and not try to attach again?
> > > > > >
> > > > > > Agreed, we need to avoid such duplicated attach. Instead of
> > > > > > adding reference counted object in vfio_group. I'm also
> > > > > > considering the logic
> > > > > > below:
> > > >
> > > > Re-walked the code, I find the duplicated attach will happen on
> > > > the vfio-mdev-pci device as vfio_mdev_attach_domain() only
> > > > attaches the parent devices of iommu backed mdevs instead of all the devices
> within the physical iommu_group.
> > > > While for a vfio-pci device, it will use iommu_attach_group()
> > > > which attaches all the devices within the iommu backed group. The
> > > > same with detach,
> > > > vfio_mdev_detach_domain() detaches selective devices instead of
> > > > all devices
> > > within
> > > > the iommu backed group.
> > >
> > > Yep, that's not good, for the non-aux case we need to follow the
> > > usual group semantics or else we're limited to singleton groups.
> >
> > yep.
> >
> > >
> > > > > > /*
> > > > > > * Do this check in vfio_iommu_type1_attach_group(), after mdev_group
> > > > > > * is initialized.
> > > > > > */
> > > > > > if (vfio_group->mdev_group) {
> > > > > > /*
> > > > > > * vfio_group->mdev_group is true means vfio_group->iommu_group
> > > > > > * is not the actual iommu_group which is going to be attached to
> > > > > > * domain. To avoid duplicate iommu_group attach, needs to check if
> > > > > > * the actual iommu_group. vfio_get_parent_iommu_group() is a
> > > > > > * newly added helper function which returns the actual attach
> > > > > > * iommu_group going to be attached for this mdev group.
> > > > > > */
> > > > > > p_iommu_group = vfio_get_parent_iommu_group(
> > > > > > vfio_group->iommu_group);
> > > > > > list_for_each_entry(d, &iommu->domain_list, next) {
> > > > > > if (find_iommu_group(d, p_iommu_group)) {
> > > > > > mutex_unlock(&iommu->lock);
> > > > > > // skip group attach;
> > > > > > }
> > > > > > }
> > > > >
> > > > > We don't currently create a struct vfio_group for the parent,
> > > > > only for the mdev iommu group. The iommu_attach for an iommu
> > > > > backed mdev doesn't leave any traces of where it is actually
> > > > > attached, we just count on retracing our steps for the detach.
> > > > > That's why I'm thinking we need an object somewhere to track it
> > > > > and it needs to be reference counted so that if both a
> > > > > vfio-mdev-pci device and a vfio-pci device are using it, we leave it in place if
> either one is removed.
> > > >
> > > > Hmmm, here we are talking about tracking in iommu_group level
> > > > though no good idea on where the object should be placed yet.
> > > > However, we may need to tack in device level as I mentioned in
> > > > above paragraph. If not, there may be sequence issue. e.g. if
> > > > vfio-mdev-pci device is attached firstly, then the object will be
> > > > initialized, and when vfio-pci device is attached, we will find the attach should
> be skipped and just inc the ref count.
> > > > But actually it should not be skipped since the vfio-mdev-pci
> > > > attach does not attach all devices within the iommu backed group.
> > >
> > > We can't do that though, the entire group needs to be attached.
> >
> > Agree, may be getting another interface which is similar with
> > iommu_attach_device(), but works for devices which is in non-singleton
> > groups. So the attach for iommu backed mdev will also result in a
> > sound attach to all the devices which share iommu group with the parent device.
>
> iommu_attach_group()...

got it. :-)

> > This is just like vfio-pci devices. For the object for tracking
> > purpose may be as below:
> >
> > struct vfio_iommu_object {
> > struct iommu_group *group;
> > struct kref kref;
> > };
> >
> > And I think it should be per-domain and per-iommu backed group since
> > aux-domain support allows a iommu backed group to be attached to
> > multiple domains. I'm considering if it is ok to have a list in vfio_domain.
> > Before each domain attach, vfio should do a check in the list if the
> > iommu backed group has been attached already. For vfio-pci devices,
> > use its iommu group to do a search in the list. For vfio-mdev-pci
> > devices, use its parent devices iommu group to do a search. Thus avoid duplicate
> attach. Thoughts?
>
> vfio_iommu_type1 already creates a struct vfio_iommu per container, which
> includes a linked list of struct vfio_domain objects, where each vfio_domain has a
> list of struct vfio_group objects. So we need to include the iommu device iommu
> group in that latter list somehow.
> Thanks,

Sure, will try it.

Thanks,
Yi Liu