Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Gunthorpe
Date: Fri May 28 2021 - 16:25:21 EST


On Fri, May 28, 2021 at 10:24:56AM +0800, Jason Wang wrote:
> > IOASID nesting can be implemented in two ways: hardware nesting and
> > software nesting. With hardware support the child and parent I/O page
> > tables are walked consecutively by the IOMMU to form a nested translation.
> > When it's implemented in software, the ioasid driver
>
> Need to explain what did "ioasid driver" mean.

I think it means "drivers/iommu"

> And if yes, does it allow the device for software specific implementation:
>
> 1) swiotlb or

I think it is necessary to have a 'software page table' which is
required to do all the mdevs we have today.

> 2) device specific IOASID implementation

"drivers/iommu" is pluggable, so I guess it can exist? I've never seen
it done before though

If we'd want this to drive an on-device translation table is an
interesting question. I don't have an answer

> > I/O page tables routed through PASID are installed in a per-RID PASID
> > table structure.
>
> I'm not sure this is true for all archs.

It must be true. For security reasons access to a PASID must be
limited by RID.

RID_A assigned to guest A should not be able to access a PASID being
used by RID_B in guest B. Only a per-RID restriction can accomplish
this.

> I would like to know the reason for such indirection.
>
> It looks to me the ioasid fd is sufficient for performing any operations.
>
> Such allocation only work if as ioas fd can have multiple ioasid which seems
> not the case you describe here.

It is the case, read the examples section. One had 3 interrelated
IOASID objects inside the same FD.

> > 5.3. IOASID nesting (software)
> > +++++++++++++++++++++++++
> >
> > Same usage scenario as 5.2, with software-based IOASID nesting
> > available. In this mode it is the kernel instead of user to create the
> > shadow mapping.
> >
> > The flow before guest boots is same as 5.2, except one point. Because
> > giova_ioasid is nested on gpa_ioasid, locked accounting is only
> > conducted for gpa_ioasid. So it's not necessary to pre-register virtual
> > memory.
> >
> > To save space we only list the steps after boots (i.e. both dev1/dev2
> > have been attached to gpa_ioasid before guest boots):
> >
> > /* After boots */
> > /* Make GIOVA space nested on GPA space */
> > giova_ioasid = ioctl(ioasid_fd, IOASID_CREATE_NESTING,
> > gpa_ioasid);
> >
> > /* Attach dev2 to the new address space (child)
> > * Note dev2 is still attached to gpa_ioasid (parent)
> > */
> > at_data = { .ioasid = giova_ioasid};
> > ioctl(device_fd2, VFIO_ATTACH_IOASID, &at_data);
>
>
> For vDPA, we need something similar. And in the future, vDPA may allow
> multiple ioasid to be attached to a single device. It should work with the
> current design.

What do you imagine multiple IOASID's being used for in VDPA?

Jason