Re: [RFC] /dev/ioasid uAPI proposal

From: David Gibson
Date: Thu Jun 03 2021 - 02:28:42 EST


On Thu, Jun 03, 2021 at 01:29:58AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Thursday, June 3, 2021 12:09 AM
> >
> > On Wed, Jun 02, 2021 at 01:33:22AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > Sent: Wednesday, June 2, 2021 1:42 AM
> > > >
> > > > On Tue, Jun 01, 2021 at 08:10:14AM +0000, Tian, Kevin wrote:
> > > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > > > Sent: Saturday, May 29, 2021 1:36 AM
> > > > > >
> > > > > > On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:
> > > > > >
> > > > > > > IOASID nesting can be implemented in two ways: hardware nesting
> > and
> > > > > > > software nesting. With hardware support the child and parent I/O
> > page
> > > > > > > tables are walked consecutively by the IOMMU to form a nested
> > > > translation.
> > > > > > > When it's implemented in software, the ioasid driver is responsible
> > for
> > > > > > > merging the two-level mappings into a single-level shadow I/O page
> > > > table.
> > > > > > > Software nesting requires both child/parent page tables operated
> > > > through
> > > > > > > the dma mapping protocol, so any change in either level can be
> > > > captured
> > > > > > > by the kernel to update the corresponding shadow mapping.
> > > > > >
> > > > > > Why? A SW emulation could do this synchronization during
> > invalidation
> > > > > > processing if invalidation contained an IOVA range.
> > > > >
> > > > > In this proposal we differentiate between host-managed and user-
> > > > > managed I/O page tables. If host-managed, the user is expected to use
> > > > > map/unmap cmd explicitly upon any change required on the page table.
> > > > > If user-managed, the user first binds its page table to the IOMMU and
> > > > > then use invalidation cmd to flush iotlb when necessary (e.g. typically
> > > > > not required when changing a PTE from non-present to present).
> > > > >
> > > > > We expect user to use map+unmap and bind+invalidate respectively
> > > > > instead of mixing them together. Following this policy, map+unmap
> > > > > must be used in both levels for software nesting, so changes in either
> > > > > level are captured timely to synchronize the shadow mapping.
> > > >
> > > > map+unmap or bind+invalidate is a policy of the IOASID itself set when
> > > > it is created. If you put two different types in a tree then each IOASID
> > > > must continue to use its own operation mode.
> > > >
> > > > I don't see a reason to force all IOASIDs in a tree to be consistent??
> > >
> > > only for software nesting. With hardware support the parent uses map
> > > while the child uses bind.
> > >
> > > Yes, the policy is specified per IOASID. But if the policy violates the
> > > requirement in a specific nesting mode, then nesting should fail.
> >
> > I don't get it.
> >
> > If the IOASID is a page table then it is bind/invalidate. SW or not SW
> > doesn't matter at all.
> >
> > > >
> > > > A software emulated two level page table where the leaf level is a
> > > > bound page table in guest memory should continue to use
> > > > bind/invalidate to maintain the guest page table IOASID even though it
> > > > is a SW construct.
> > >
> > > with software nesting the leaf should be a host-managed page table
> > > (or metadata). A bind/invalidate protocol doesn't require the user
> > > to notify the kernel of every page table change.
> >
> > The purpose of invalidate is to inform the implementation that the
> > page table has changed so it can flush the caches. If the page table
> > is changed and invalidation is not issued then then the implementation
> > is free to ignore the changes.
> >
> > In this way the SW mode is the same as a HW mode with an infinite
> > cache.
> >
> > The collaposed shadow page table is really just a cache.
> >
>
> OK. One additional thing is that we may need a 'caching_mode"
> thing reported by /dev/ioasid, indicating whether invalidation is
> required when changing non-present to present. For hardware
> nesting it's not reported as the hardware IOMMU will walk the
> guest page table in cases of iotlb miss. For software nesting
> caching_mode is reported so the user must issue invalidation
> upon any change in guest page table so the kernel can update
> the shadow page table timely.

For the fist cut, I'd have the API assume that invalidates are
*always* required. Some bypass to avoid them in cases where they're
not needed can be an additional extension.

> Following this and your other comment with David, we will mark
> host-managed vs. guest-managed explicitly for I/O page table
> of each IOASID. map+unmap or bind+invalid is decided by
> which owner is specified by the user.
>
> Thanks
> Kevin
>

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature