Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3

From: Jean-Philippe Brucker
Date: Thu Mar 09 2023 - 08:42:48 EST


Hi Nicolin,

On Thu, Mar 09, 2023 at 02:53:38AM -0800, Nicolin Chen wrote:
> Add the following data structures for corresponding ioctls:
> iommu_hwpt_arm_smmuv3 => IOMMUFD_CMD_HWPT_ALLOC
> iommu_hwpt_invalidate_arm_smmuv3 => IOMMUFD_CMD_HWPT_INVALIDATE
>
> Also, add IOMMU_HW_INFO_TYPE_ARM_SMMUV3 and IOMMU_PGTBL_TYPE_ARM_SMMUV3_S1
> to the header and corresponding type/size arrays.
>
> Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>

> +/**
> + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 specific page table data
> + *
> + * @flags: page table entry attributes
> + * @s2vmid: Virtual machine identifier
> + * @s1ctxptr: Stage-1 context descriptor pointer
> + * @s1cdmax: Number of CDs pointed to by s1ContextPtr
> + * @s1fmt: Stage-1 Format
> + * @s1dss: Default substream
> + */
> +struct iommu_hwpt_arm_smmuv3 {
> +#define IOMMU_SMMUV3_FLAG_S2 (1 << 0) /* if unset, stage-1 */

I don't understand the purpose of this flag, since the structure only
provides stage-1 configuration fields

> +#define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */

Doesn't this break isolation? The VMID space is global for the SMMU, so
the guest could access cached mappings of another device

> + __u64 flags;
> + __u32 s2vmid;
> + __u32 __reserved;
> + __u64 s1ctxptr;
> + __u64 s1cdmax;
> + __u64 s1fmt;
> + __u64 s1dss;
> +};
> +


> +/**
> + * struct iommu_hwpt_invalidate_arm_smmuv3 - ARM SMMUv3 cahce invalidation info
> + * @flags: boolean attributes of cache invalidation command
> + * @opcode: opcode of cache invalidation command
> + * @ssid: SubStream ID
> + * @granule_size: page/block size of the mapping in bytes
> + * @range: IOVA range to invalidate
> + */
> +struct iommu_hwpt_invalidate_arm_smmuv3 {
> +#define IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF (1 << 0)
> + __u64 flags;
> + __u8 opcode;
> + __u8 padding[3];
> + __u32 asid;
> + __u32 ssid;
> + __u32 granule_size;
> + struct iommu_iova_range range;
> +};

Although we can keep the alloc and hardware info separate for each IOMMU
architecture, we should try to come up with common invalidation methods.

It matters because things like vSVA, or just efficient dynamic mappings,
will require optimal invalidation latency. A paravirtual interface like
vhost-iommu can help with that, as the host kernel will handle guest
invalidations directly instead of doing a round-trip to host userspace
(and we'll likely want the same path for PRI.)

Supporting HW page tables for a common PV IOMMU does require some
architecture-specific knowledge, but invalidation messages contain roughly
the same information on all architectures. The PV IOMMU won't include
command opcodes for each possible architecture if one generic command does
the same job.

Ideally I'd like something like this for vhost-iommu:

* slow path through userspace for complex requests like attach-table and
probe, where the VMM can decode arch-specific information and translate
them to iommufd and vhost-iommu ioctls to update the configuration.

* fast path within the kernel for performance-critical requests like
invalidate, page request and response. It would be absurd for the
vhost-iommu driver to translate generic invalidation requests from the
guest into arch-specific commands with special opcodes, when the next
step is calling the IOMMU driver which does that for free.

During previous discussions we came up with generic invalidations that
could fit both Arm and x86 [1][2]. The only difference was the ASID
(called archid/id in those proposals) which VT-d didn't need. Could we try
to build on that?

[1] https://elixir.bootlin.com/linux/v5.17/source/include/uapi/linux/iommu.h#L161
[2] https://lists.oasis-open.org/archives/virtio-dev/202102/msg00014.html

Thanks,
Jean