Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Gunthorpe
Date: Tue Jun 01 2021 - 13:42:45 EST


On Tue, Jun 01, 2021 at 08:10:14AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Saturday, May 29, 2021 1:36 AM
> >
> > On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:
> >
> > > IOASID nesting can be implemented in two ways: hardware nesting and
> > > software nesting. With hardware support the child and parent I/O page
> > > tables are walked consecutively by the IOMMU to form a nested translation.
> > > When it's implemented in software, the ioasid driver is responsible for
> > > merging the two-level mappings into a single-level shadow I/O page table.
> > > Software nesting requires both child/parent page tables operated through
> > > the dma mapping protocol, so any change in either level can be captured
> > > by the kernel to update the corresponding shadow mapping.
> >
> > Why? A SW emulation could do this synchronization during invalidation
> > processing if invalidation contained an IOVA range.
>
> In this proposal we differentiate between host-managed and user-
> managed I/O page tables. If host-managed, the user is expected to use
> map/unmap cmd explicitly upon any change required on the page table.
> If user-managed, the user first binds its page table to the IOMMU and
> then use invalidation cmd to flush iotlb when necessary (e.g. typically
> not required when changing a PTE from non-present to present).
>
> We expect user to use map+unmap and bind+invalidate respectively
> instead of mixing them together. Following this policy, map+unmap
> must be used in both levels for software nesting, so changes in either
> level are captured timely to synchronize the shadow mapping.

map+unmap or bind+invalidate is a policy of the IOASID itself set when
it is created. If you put two different types in a tree then each IOASID
must continue to use its own operation mode.

I don't see a reason to force all IOASIDs in a tree to be consistent??

A software emulated two level page table where the leaf level is a
bound page table in guest memory should continue to use
bind/invalidate to maintain the guest page table IOASID even though it
is a SW construct.

The GPA level should use map/unmap because it is a kernel owned page
table

Though how to efficiently mix map/unmap on the GPA when there are SW
nested levels below it looks to be quite challenging.

Jason