Re: [PATCH 1/4] iommu: Add iommu_device_group callback andiommu_group sysfs entry

From: Alex Williamson
Date: Thu Dec 01 2011 - 09:34:21 EST


On Thu, 2011-12-01 at 10:33 +0000, David Woodhouse wrote:
> On Wed, 2011-11-30 at 23:48 -0700, Alex Williamson wrote:
> > I'm not sure if we're getting into VM usage with "assigning" terminology
> > above. You're free to architect qemu however you want on POWER to make
> > groups be the assignable unit to a guest. On x86, an individual device
> > is the assignable unit for a guest. Unassigned group devices will still
> > be required to be held by vfio, they'll just be unused. Thanks,
>
> I'm not sure I understand this.
>
> BY DEFINITION, the smallest assignable unit is the group, surely? Isn't
> that the *point* of the groups? That the IOMMU cannot tell the
> difference between the devices in the group?
>
> So in *practice*, surely you cannot assign just *one* device from a
> group? You can assign the while group, or nothing.
>
> You might *pretend* to assign single devices, and we might try to cope
> with the weirdness that happens when you want *one* device of a group to
> be owned by one VM, another device in the same group to be owned by a
> second VM, and a third device from the same group to be driven by a
> native driver in the host. But why not just assign groups as a whole?
> Surely that makes most sense?

We've got multiple levels when we add qemu and guests into the mix. A
group is the smallest assignable unit for kernel->userspace. In fact,
vfio is constructed so that the user cannot do anything with a group
until all the devices of the group are bound to the vfio bus drivers.
Qemu, as a userspace driver, must therefore take ownership of the entire
group. However, there's no requirement that a userspace driver must
make use of all the devices in the group, so qemu is free to expose
individual devices from the group to the guest. IMHO, it doesn't make
sense to have a default model saying "I know you just wanted the nic,
but it's in the same group as this graphics card, so surprise, you get
both!". Obviously if a user does want to expose multiple devices from a
group to a guest, we support that too.

Spitting groups among multiple VMs or between VM and native host drivers
defeats the purpose of the group. Neither of these are allowed.

> Btw, did we get a quirk for the Ricoh multi-function devices which all
> need to be in the same group because they do all their DMA from function
> zero? I think we need another similar quirk for a Marvell SATA
> controller which seems to do its AHCI DMA from its IDE function; see
> https://bugzilla.redhat.com/757166

No, as I mentioned, groups are currently for iommu_ops, not dma_ops,
though it makes sense that iommu drivers could use the group info or
create common quirk infrastructure for handling broken devices like
these. Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/