Re: [RFC] /dev/ioasid uAPI proposal

From: David Gibson
Date: Wed Jun 02 2021 - 03:29:34 EST


On Fri, May 28, 2021 at 02:35:38PM -0300, Jason Gunthorpe wrote:
> On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:
[snip]
> > With above design /dev/ioasid uAPI is all about I/O address spaces.
> > It doesn't include any device routing information, which is only
> > indirectly registered to the ioasid driver through VFIO uAPI. For
> > example, I/O page fault is always reported to userspace per IOASID,
> > although it's physically reported per device (RID+PASID).
>
> I agree with Jean-Philippe - at the very least erasing this
> information needs a major rational - but I don't really see why it
> must be erased? The HW reports the originating device, is it just a
> matter of labeling the devices attached to the /dev/ioasid FD so it
> can be reported to userspace?

HW reports the originating device as far as it knows. In many cases
where you have multiple devices in an IOMMU group, it's because
although they're treated as separate devices at the kernel level, they
have the same RID at the HW level. Which means a RID for something in
the right group is the closest you can count on supplying.

[snip]
> > However this way significantly
> > violates the philosophy in this /dev/ioasid proposal. It is not one IOASID
> > one address space any more. Device routing information (indirectly
> > marking hidden I/O spaces) has to be carried in iotlb invalidation and
> > page faulting uAPI to help connect vIOMMU with the underlying
> > pIOMMU. This is one design choice to be confirmed with ARM guys.
>
> I'm confused by this rational.
>
> For a vIOMMU that has IO page tables in the guest the basic
> choices are:
> - Do we have a hypervisor trap to bind the page table or not? (RID
> and PASID may differ here)
> - Do we have a hypervisor trap to invaliate the page tables or not?
>
> If the first is a hypervisor trap then I agree it makes sense to create a
> child IOASID that points to each guest page table and manage it
> directly. This should not require walking guest page tables as it is
> really just informing the HW where the page table lives. HW will walk
> them.
>
> If there are no hypervisor traps (does this exist?) then there is no
> way to involve the hypervisor here and the child IOASID should simply
> be a pointer to the guest's data structure that describes binding. In
> this case that IOASID should claim all PASIDs when bound to a
> RID.

And in that case I think we should call that object something other
than an IOASID, since it represents multiple address spaces.

> Invalidation should be passed up the to the IOMMU driver in terms of
> the guest tables information and either the HW or software has to walk
> to guest tables to make sense of it.
>
> Events from the IOMMU to userspace should be tagged with the attached
> device label and the PASID/substream ID. This means there is no issue
> to have a a 'all PASID' IOASID.
>
> > Notes:
> > - It might be confusing as IOASID is also used in the kernel (drivers/
> > iommu/ioasid.c) to represent PCI PASID or ARM substream ID. We need
> > find a better name later to differentiate.
>
> +1 on Jean-Philippe's remarks
>
> > - PPC has not be considered yet as we haven't got time to fully understand
> > its semantics. According to previous discussion there is some generality
> > between PPC window-based scheme and VFIO type1 semantics. Let's
> > first make consensus on this proposal and then further discuss how to
> > extend it to cover PPC's requirement.
>
> From what I understood PPC is not so bad, Nesting IOASID's did its
> preload feature and it needed a way to specify/query the IOVA range a
> IOASID will cover.
>
> > - There is a protocol between vfio group and kvm. Needs to think about
> > how it will be affected following this proposal.
>
> Ugh, I always stop looking when I reach that boundary. Can anyone
> summarize what is going on there?
>
> Most likely passing the /dev/ioasid into KVM's FD (or vicevera) is the
> right answer. Eg if ARM needs to get the VMID from KVM and set it to
> ioasid then a KVM "ioctl set_arm_vmid(/dev/ioasid)" call is
> reasonable. Certainly better than the symbol get sutff we have right
> now.
>
> I will read through the detail below in another email
>
> Jason
>

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature