Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Jason Gunthorpe
Date: Thu Apr 01 2021 - 14:45:14 EST


On Thu, Apr 01, 2021 at 07:04:01AM +0000, Liu, Yi L wrote:

> After reading your reply in https://lore.kernel.org/linux-iommu/20210331123801.GD1463678@xxxxxxxxxx/#t
> So you mean /dev/ioasid FD is per-VM instead of per-ioasid, so above skeleton
> doesn't suit your idea.

You can do it one PASID per FD or multiple PASID's per FD. Most likely
we will have high numbers of PASID's in a qemu process so I assume
that number of FDs will start to be a contraining factor, thus
multiplexing is reasonable.

It doesn't really change anything about the basic flow.

digging deeply into it either seems like a reasonable choice.

> +-----------------------------+-----------------------------------------------+
> | userspace | kernel space |
> +-----------------------------+-----------------------------------------------+
> | ioasid_fd = | /dev/ioasid does below: |
> | open("/dev/ioasid", O_RDWR);| struct ioasid_fd_ctx { |
> | | struct list_head ioasid_list; |
> | | ... |
> | | } ifd_ctx; // ifd_ctx is per ioasid_fd |

Sure, possibly an xarray not a list

> +-----------------------------+-----------------------------------------------+
> | ioctl(ioasid_fd, | /dev/ioasid does below: |
> | ALLOC, &ioasid); | struct ioasid_data { |
> | | ioasid_t ioasid; |
> | | struct list_head device_list; |
> | | struct list_head next; |
> | | ... |
> | | } id_data; // id_data is per ioasid |
> | | |
> | | list_add(&id_data.next, |
> | | &ifd_ctx.ioasid_list);
> |

Yes, this should have a kref in it too

> +-----------------------------+-----------------------------------------------+
> | ioctl(device_fd, | VFIO does below: |
> | DEVICE_ALLOW_IOASID, | 1) get ioasid_fd, check if ioasid_fd is valid |
> | ioasid_fd, | 2) check if ioasid is allocated from ioasid_fd|
> | ioasid); | 3) register device/domain info to /dev/ioasid |
> | | tracked in id_data.device_list |
> | | 4) record the ioasid in VFIO's per-device |
> | | ioasid list for future security check |

You would provide a function that does steps 1&2 look at eventfd for
instance.

I'm not sure we need to register the device with the ioasid. device
should incr the kref on the ioasid_data at this point.

> +-----------------------------+-----------------------------------------------+
> | ioctl(ioasid_fd, | /dev/ioasid does below: |
> | BIND_PGTBL, | 1) find ioasid's id_data |
> | pgtbl_data, | 2) loop the id_data.device_list and tell iommu|
> | ioasid); | give ioasid access to the devices
> |

This seems backwards, DEVICE_ALLOW_IOASID should tell the iommu to
give the ioasid to the device.

Here the ioctl should be about assigning a memory map from the the current
mm_struct to the pasid

> +-----------------------------+-----------------------------------------------+
> | ioctl(ioasid_fd, | /dev/ioasid does below: |
> | UNBIND_PGTBL, | 1) find ioasid's id_data |
> | ioasid); | 2) loop the id_data.device_list and tell iommu|
> | | clear ioasid access to the devices |

Also seems backwards. The ioctl here should be 'destroy ioasid' which
wipes out the page table, halts DMA access and parks the PASID until
all users are done.

> +-----------------------------+-----------------------------------------------+
> | ioctl(device_fd, | VFIO does below: |
> | DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's |
> | ioasid_fd, | device ioasid list. |
> | ioasid); | 2) unregister device/domain info from |
> | | /dev/ioasid, clear in id_data.device_list |

This should disconnect the iommu and kref_put the ioasid_data

Remember the layering, only the device_fd knows what the pci_device is
that it is touching, it doesn't make alot of sense to leak that into
the ioasid world that should only be dealing with the page table
mapping.

> +-----------------------------+-----------------------------------------------+
> | ioctl(ioasid_fd, | /dev/ioasid does below: |
> | FREE, ioasid); | list_del(&id_data.next); |
> +-----------------------------+-----------------------------------------------+

Don't know if we need a free. The sequence above is backwards, the
page table should be setup, the device authorized, device
de-authorized then page table destroyed. PASID recycles once everyone
is released.

Include a sequence showing how the kvm FD is used to program the
vPASID to pPASID table that ENQCMD uses.

Show how dynamic authorization works based on requests from the
guest's vIOMMU

Jason