[PATCH v3 00/11] cover-letter: iommufd: Add vIOMMU infrastructure (Part-1)

From: Nicolin Chen
Date: Wed Oct 09 2024 - 12:39:36 EST


This series introduces a new vIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
parent HWPT typically hold one stage-2 IO pagetable and tag it with only
one ID in the cache entries. When sharing one large stage-2 IO pagetable
across physical IOMMU instances, that one ID may not always be available
across all the IOMMU instances. In other word, it's ideal for SW to have
a different container for the stage-2 IO pagetable so it can hold another
ID that's available.

For this "different container", add vIOMMU, an additional layer to hold
extra virtualization information:
_______________________________________________________________________
| iommufd (with vIOMMU) |
| |
| [5] |
| _____________ |
| | | |
| [1] | vIOMMU | [4] [2] |
| ________________ | | _____________ ________ |
| | | | [3] | | | | | |
| | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
| |________________| |_____________| |_____________| |________| |
| | | | | |
|_________|____________________|__________________|_______________|_____|
| | | |
| ______v_____ ______v_____ ___v__
| PFN storage | (paging) | | (nested) | |struct|
|------------>|iommu_domain|<----|iommu_domain|<----|device|
|____________| |____________| |______|

The vIOMMU object should be seen as a slice of a physical IOMMU instance
that is passed to or shared with a VM. That can be some HW/SW resources:
- Security namespace for guest owned ID, e.g. guest-controlled cache tags
- Access to a sharable nesting parent pagetable across physical IOMMUs
- Virtualization of various platforms IDs, e.g. RIDs and others
- Delivery of paravirtualized invalidation
- Direct assigned invalidation queues
- Direct assigned interrupts
- Non-affiliated event reporting

On a multi-IOMMU system, the vIOMMU object must be instanced to the number
of the physical IOMMUs that are passed to (via devices) a guest VM, while
being able to hold the shareable parent HWPT. Each vIOMMU then just needs
to allocate its own individual ID to tag its own cache:
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested0 |--->| viommu0 ------------------
---------------- | | IDx |
----------------------------
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested1 |--->| viommu1 ------------------
---------------- | | IDy |
----------------------------

As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation
only. Later series will add more data structures and their ioctls.

As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT
type for a core-allocated-core-managed vIOMMU object, allowing drivers to
simply hook a default viommu ops for viommu-based invalidation alone. And
add support for driver-specific type of vIOMMU allocation, and implement
that in the ARM SMMUv3 driver for a real world use case.

More vIOMMU-based structs and ioctls will be introduced in the follow-up
series to support vDEVICE, vIRQ (vEVENT) and VQUEUE objects. Although we
repurposed the vIOMMU object from an earlier RFC, just for a referece:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@xxxxxxxxxx/

This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v3
(paring QEMU branch for testing will be provided with the part2 series)

Changelog
v3
* Rebased on top of Jason's nesting v3 series
https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@xxxxxxxxxx/
* Split the series into smaller parts
* Added Jason's Reviewed-by
* Added back viommu->iommu_dev
* Added support for driver-allocated vIOMMU v.s. core-allocated
* Dropped arm_smmu_cache_invalidate_user
* Added an iommufd_test_wait_for_users() in selftest
* Reworked test code to make viommu an individual FIXTURE
* Added missing TEST_LENGTH case for the new ioctl command
v2
https://lore.kernel.org/all/cover.1724776335.git.nicolinc@xxxxxxxxxx/
* Limited vdev_id to one per idev
* Added a rw_sem to protect the vdev_id list
* Reworked driver-level APIs with proper lockings
* Added a new viommu_api file for IOMMUFD_DRIVER config
* Dropped useless iommu_dev point from the viommu structure
* Added missing index numnbers to new types in the uAPI header
* Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
* Reworked mock_viommu_cache_invalidate() using the new iommu helper
* Reordered details of set/unset_vdev_id handlers for proper lockings
v1
https://lore.kernel.org/all/cover.1723061377.git.nicolinc@xxxxxxxxxx/

Thanks!
Nicolin

Nicolin Chen (11):
iommufd: Move struct iommufd_object to public iommufd header
iommufd: Rename _iommufd_object_alloc to iommufd_object_alloc_elm
iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
iommu: Pass in a viommu pointer to domain_alloc_user op
iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
iommufd/selftest: Add refcount to mock_iommu_device
iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST
iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
Documentation: userspace-api: iommufd: Update vIOMMU
iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support

drivers/iommu/iommufd/Makefile | 5 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 18 ++++
drivers/iommu/iommufd/iommufd_private.h | 23 ++---
drivers/iommu/iommufd/iommufd_test.h | 2 +
include/linux/iommu.h | 15 +++
include/linux/iommufd.h | 52 +++++++++++
include/uapi/linux/iommufd.h | 54 +++++++++--
tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++
drivers/iommu/amd/iommu.c | 1 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 24 +++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +
drivers/iommu/intel/iommu.c | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 27 +++++-
drivers/iommu/iommufd/main.c | 38 ++------
drivers/iommu/iommufd/selftest.c | 79 ++++++++++++++--
drivers/iommu/iommufd/viommu.c | 91 +++++++++++++++++++
drivers/iommu/iommufd/viommu_api.c | 57 ++++++++++++
tools/testing/selftests/iommu/iommufd.c | 84 +++++++++++++++++
Documentation/userspace-api/iommufd.rst | 66 +++++++++++++-
19 files changed, 602 insertions(+), 65 deletions(-)
create mode 100644 drivers/iommu/iommufd/viommu.c
create mode 100644 drivers/iommu/iommufd/viommu_api.c

--
2.43.0