On Tue, Jun 01, 2021 at 07:09:21PM +0800, Lu Baolu wrote:
Hi Jason,Isn't (2) the equivalent of using the using the host-managed pagetable
On 2021/5/29 7:36, Jason Gunthorpe wrote:
Thinking of the required page table format, perhaps we should shed more/*Also feels backwards, why wouldn't we specify this, and the required
* Bind an user-managed I/O page table with the IOMMU
*
* Because user page table is untrusted, IOASID nesting must be enabled
* for this ioasid so the kernel can enforce its DMA isolation policy
* through the parent ioasid.
*
* Pgtable binding protocol is different from DMA mapping. The latter
* has the I/O page table constructed by the kernel and updated
* according to user MAP/UNMAP commands. With pgtable binding the
* whole page table is created and updated by userspace, thus different
* set of commands are required (bind, iotlb invalidation, page fault, etc.).
*
* Because the page table is directly walked by the IOMMU, the user
* must use a format compatible to the underlying hardware. It can
* check the format information through IOASID_GET_INFO.
*
* The page table is bound to the IOMMU according to the routing
* information of each attached device under the specified IOASID. The
* routing information (RID and optional PASID) is registered when a
* device is attached to this IOASID through VFIO uAPI.
*
* Input parameters:
* - child_ioasid;
* - address of the user page table;
* - formats (vendor, address_width, etc.);
*
* Return: 0 on success, -errno on failure.
*/
#define IOASID_BIND_PGTABLE _IO(IOASID_TYPE, IOASID_BASE + 9)
#define IOASID_UNBIND_PGTABLE _IO(IOASID_TYPE, IOASID_BASE + 10)
page table format, during alloc time?
light on the page table of an IOASID. So far, an IOASID might represent
one of the following page tables (might be more):
1) an IOMMU format page table (a.k.a. iommu_domain)
2) a user application CPU page table (SVA for example)
3) a KVM EPT (future option)
4) a VM guest managed page table (nesting mode)
This version only covers 1) and 4). Do you think we need to support 2),
then doing a giant MAP of all your user address space into it? But
maybe we should identify that case explicitly in case the host can
optimize it.