Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3

From: Jason Gunthorpe
Date: Thu Mar 09 2023 - 09:51:28 EST


On Thu, Mar 09, 2023 at 01:42:17PM +0000, Jean-Philippe Brucker wrote:

> Although we can keep the alloc and hardware info separate for each IOMMU
> architecture, we should try to come up with common invalidation methods.

The invalidation language is tightly linked to the actual cache block
and cache tag in the IOMMU HW design. Generality will loose or
obfuscate the necessary specificity that is required for creating real
vIOMMUs.

Further, invalidation is a fast path, it is crazy to take a vIOMMU of
a real HW receving a native invalidation request, mangle it to some
obfuscated kernel version and then de-mangle it again in the kernel
driver. IMHO ideally qemu will simply point the invalidation at the
WQE in the SW vIOMMU command queue and invoke the ioctl. (Nicolin, we
should check more into this)

The purpose of these interfaces is to support high performance full
functionality vIOMMUs of the real HW, we should not loose sight of
that goal.

We are actually planning to go futher and expose direct invalidation
work queues complete with HW doorbells to userspace. This obviously
must be in native HW format.

Nicolin, I think we should tweak the uAPI here so that the
invalidation opaque data has a format tagged on its own, instead of
re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
tag and also a virtio-viommu invalidate type tag.

This will allow Jean to put the invalidation decoding in the iommu
drivers if we think that is the right direction. virtio can
standardize it as a "HW format".

> Ideally I'd like something like this for vhost-iommu:
>
> * slow path through userspace for complex requests like attach-table and
> probe, where the VMM can decode arch-specific information and translate
> them to iommufd and vhost-iommu ioctls to update the configuration.
>
> * fast path within the kernel for performance-critical requests like
> invalidate, page request and response. It would be absurd for the
> vhost-iommu driver to translate generic invalidation requests from the
> guest into arch-specific commands with special opcodes, when the next
> step is calling the IOMMU driver which does that for free.

Someone has to do the conversion. If you don't think virito should do
it then I'd be OK to add a type tag for virtio format invalidation and
put it in the IOMMU driver.

But given virtio overall already has to know *alot* about how the HW
it is wrapping works I don't think it is necessarily absurd for virtio
to do the conversion. I'd like to evaluate this in patches in context
with how much other unique HW code ends up in kernel-side vhost-iommu.

However, I don't know the rational for virtio-viommu, it seems like a
strange direction to me. All the iommu drivers have native command
queues. ARM and AMD are both supporting native command queues directly
in the guest, complete with a direct guest MMIO doorbell ring.

If someone wants to optimize this I'd think the way to do it is to use
virtio like techniques to put SW command queue processing in the
kernel iommu driver and continue to use the HW native interface in the
VM.

What benifit comes from replacing the HW native interface with virtio?
Especially on ARM where the native interface is pretty clean?

> During previous discussions we came up with generic invalidations that
> could fit both Arm and x86 [1][2]. The only difference was the ASID
> (called archid/id in those proposals) which VT-d didn't need. Could we try
> to build on that?

IMHO this was just unioning all the different invalidation types
together. It makes sense for something like virtio but it is
illogical/obfuscated as a user/kernel interface since it still
requires a userspace HW driver to understand what subset of the
invalidations are used on the actual HW.

Jason