[GIT PULL] Please pull IOMMUFD subsystem changes

From: Jason Gunthorpe
Date: Tue Oct 31 2023 - 09:14:29 EST


Hi Linus,

This PR includes the dirty tracking and first part of the nested
translation items for iommufd, details in the tag.

For those following, these series are still progressing:

- User page table invalidation:
https://lore.kernel.org/r/20231020092426.13907-1-yi.l.liu@xxxxxxxxx
https://lore.kernel.org/r/20231020093719.18725-1-yi.l.liu@xxxxxxxxx

- ARM SMMUv3 nested translation:
https://lore.kernel.org/all/cover.1683688960.git.nicolinc@xxxxxxxxxx/

- Draft AMD IOMMU nested translation:
https://lore.kernel.org/all/20230621235508.113949-1-suravee.suthikulpanit@xxxxxxx/

- ARM SMMUv3 Dirty tracking:
https://github.com/jpemartins/linux/commits/smmu-iommufd-v3

There is also a lot of ongoing work to consistently and generically enable
PASID and SVA support in all the IOMMU drivers:

SMMUv3:
https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@xxxxxxxxxx
https://lore.kernel.org/r/0-v1-afbb86647bbd+5-smmuv3_newapi_p2_jgg@xxxxxxxxxx
AMD:
https://lore.kernel.org/all/20231016104351.5749-1-vasant.hegde@xxxxxxx/
https://lore.kernel.org/all/20231013151652.6008-1-vasant.hegde@xxxxxxx/
Intel:
https://lore.kernel.org/r/20231017032045.114868-1-tina.zhang@xxxxxxxxx

RFC patches for PASID support in iommufd & vfio:
https://lore.kernel.org/all/20230926092651.17041-1-yi.l.liu@xxxxxxxxx/
https://lore.kernel.org/all/20230926093121.18676-1-yi.l.liu@xxxxxxxxx/

IO page faults and events delivered to userspace through iommufd:
https://lore.kernel.org/all/20231026024930.382898-1-baolu.lu@xxxxxxxxxxxxxxx/

RFC patches exploring support for the first Intel Scalable IO Virtualization
(SIOV r1) device are posted:
https://lore.kernel.org/all/20231009085123.463179-1-yi.l.liu@xxxxxxxxx/

Along with qemu patches implementing iommufd:
https://lore.kernel.org/all/20231016083223.1519410-1-zhenzhong.duan@xxxxxxxxx/

There are some conflicts with Joerg's main iommu tree, most are of the append
to list type of conflict. A few notes:

drivers/iommu/iommufd/selftest.c needs a non-conflict hunk:

- static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
- {
- if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED)
- return &mock_blocking_domain;
- if (iommu_domain_type == IOMMU_DOMAIN_UNMANAGED)
- return mock_domain_alloc_paging(NULL);
- return NULL;
- }
-

drivers/iommu/iommufd/selftest.c should be:

@@@ -466,10 -293,8 +450,9 @@@ static const struct iommu_ops mock_ops
.owner = THIS_MODULE,
.pgsize_bitmap = MOCK_IO_PAGE_SIZE,
.hw_info = mock_domain_hw_info,
- .domain_alloc = mock_domain_alloc,
+ .domain_alloc_paging = mock_domain_alloc_paging,
+ .domain_alloc_user = mock_domain_alloc_user,
.capable = mock_domain_capable,
- .set_platform_dma_ops = mock_domain_set_plaform_dma_ops,

include/linux/iommu.h should be:

- * @domain_alloc: allocate iommu domain
+ * @domain_alloc: allocate and return an iommu domain if success. Otherwise
+ * NULL is returned. The domain is not fully initialized until
+ * the caller iommu_domain_alloc() returns.
+ * @domain_alloc_user: Allocate an iommu domain corresponding to the input
+ * parameters as defined in include/uapi/linux/iommufd.h.
+ * Unlike @domain_alloc, it is called only by IOMMUFD and
+ * must fully initialize the new domain before return.
+ * Upon success, if the @user_data is valid and the @parent
+ * points to a kernel-managed domain, the new domain must be
+ * IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be
+ * NULL while the @user_data can be optionally provided, the
+ * new domain must support __IOMMU_DOMAIN_PAGING.
+ * Upon failure, ERR_PTR must be returned.
+ * @domain_alloc_paging: Allocate an iommu_domain that can be used for
+ * UNMANAGED, DMA, and DMA_FQ domain types.

The rest were straightforward.

The tag for-linus-iommufd-merged with my merge resolution to your tree
is also available to pull.

Thanks,
Jason

The following changes since commit ce9ecca0238b140b88f43859b211c9fdfd8e5b70:

Linux 6.6-rc2 (2023-09-17 14:40:24 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git tags/for-linus-iommufd

for you to fetch changes up to b2b67c997bf74453f3469d8b54e4859f190943bd:

iommufd: Organize the mock domain alloc functions closer to Joerg's tree (2023-10-30 18:01:56 -0300)

----------------------------------------------------------------
iommufd for 6.7

This branch has three new iommufd capabilities:

- Dirty tracking for DMA. AMD/ARM/Intel CPUs can now record if a DMA
writes to a page in the IOPTEs within the IO page table. This can be used
to generate a record of what memory is being dirtied by DMA activities
during a VM migration process. A VMM like qemu will combine the IOMMU
dirty bits with the CPU's dirty log to determine what memory to
transfer.

VFIO already has a DMA dirty tracking framework that requires PCI
devices to implement tracking HW internally. The iommufd version
provides an alternative that the VMM can select, if available. The two
are designed to have very similar APIs.

- Userspace controlled attributes for hardware page
tables (HWPT/iommu_domain). There are currently a few generic attributes
for HWPTs (support dirty tracking, and parent of a nest). This is an
entry point for the userspace iommu driver to control the HW in detail.

- Nested translation support for HWPTs. This is a 2D translation scheme
similar to the CPU where a DMA goes through a first stage to determine
an intermediate address which is then translated trough a second stage
to a physical address.

Like for CPU translation the first stage table would exist in VM
controlled memory and the second stage is in the kernel and matches the
VM's guest to physical map.

As every IOMMU has a unique set of parameter to describe the S1 IO page
table and its associated parameters the userspace IOMMU driver has to
marshal the information into the correct format.

This is 1/3 of the feature, it allows creating the nested translation
and binding it to VFIO devices, however the API to support IOTLB and
ATC invalidation of the stage 1 io page table, and forwarding of IO
faults are still in progress.

The series includes AMD and Intel support for dirty tracking. Intel
support for nested translation.

Along the way are a number of internal items:

- New iommu core items: ops->domain_alloc_user(), ops->set_dirty_tracking,
ops->read_and_clear_dirty(), IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user

- UAF fix in iopt_area_split()

- Spelling fixes and some test suite improvement

----------------------------------------------------------------
GuokaiXu (1):
iommufd: Fix spelling errors in comments

Jason Gunthorpe (4):
iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
iommufd/device: Wrap IOMMUFD_OBJ_HWPT_PAGING-only configurations
iommufd: Add iopt_area_alloc()
iommufd: Organize the mock domain alloc functions closer to Joerg's tree

Joao Martins (19):
vfio/iova_bitmap: Export more API symbols
vfio: Move iova_bitmap into iommufd
iommufd/iova_bitmap: Move symbols to IOMMUFD namespace
iommu: Add iommu_domain ops for dirty tracking
iommufd: Add a flag to enforce dirty tracking on attach
iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING
iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP
iommufd: Add capabilities to IOMMU_GET_HW_INFO
iommufd: Add a flag to skip clearing of IOPTE dirty
iommu/amd: Add domain_alloc_user based domain allocation
iommu/amd: Access/Dirty bit support in IOPTEs
iommu/vt-d: Access/Dirty bit support for SS domains
iommufd/selftest: Expand mock_domain with dev_flags
iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING
iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING
iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP
iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag
iommufd/selftest: Fix page-size check in iommufd_test_dirty()

Koichiro Den (1):
iommufd: Fix missing update of domains_itree after splitting iopt_area

Lu Baolu (6):
iommu: Add IOMMU_DOMAIN_NESTED
iommu/vt-d: Extend dmar_domain to support nested domain
iommu/vt-d: Add helper for nested domain allocation
iommu/vt-d: Add helper to setup pasid nested translation
iommu/vt-d: Add nested domain allocation
iommu/vt-d: Disallow read-only mappings to nest parent domain

Nicolin Chen (10):
iommufd/selftest: Iterate idev_ids in mock_domain's alloc_hwpt test
iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
iommufd: Correct IOMMU_HWPT_ALLOC_NEST_PARENT description
iommufd: Only enforce cache coherency in iommufd_hw_pagetable_alloc
iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
iommufd: Add a nested HW pagetable object
iommu: Add iommu_copy_struct_from_user helper
iommufd/selftest: Add nested domain allocation for mock domain
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs

Yi Liu (11):
iommu: Add new iommu op to create domains owned by userspace
iommufd: Use the domain_alloc_user() op for domain allocation
iommufd: Flow user flags for domain allocation to domain_alloc_user()
iommufd: Support allocating nested parent domain
iommufd/selftest: Add domain_alloc_user() support in iommu mock
iommu/vt-d: Add domain_alloc_user op
iommu: Pass in parent domain with user_data to domain_alloc_user op
iommu/vt-d: Enhance capability check for nested parent domain allocation
iommufd: Add data structure for Intel VT-d stage-1 domain allocation
iommu/vt-d: Make domain attach helpers to be extern
iommu/vt-d: Set the nested domain to a device

drivers/iommu/Kconfig | 4 +
drivers/iommu/amd/Kconfig | 1 +
drivers/iommu/amd/amd_iommu_types.h | 12 +
drivers/iommu/amd/io_pgtable.c | 68 ++++
drivers/iommu/amd/iommu.c | 147 ++++++++-
drivers/iommu/intel/Kconfig | 1 +
drivers/iommu/intel/Makefile | 2 +-
drivers/iommu/intel/iommu.c | 156 +++++++++-
drivers/iommu/intel/iommu.h | 64 +++-
drivers/iommu/intel/nested.c | 117 +++++++
drivers/iommu/intel/pasid.c | 221 +++++++++++++
drivers/iommu/intel/pasid.h | 6 +
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 174 +++++++----
drivers/iommu/iommufd/hw_pagetable.c | 304 ++++++++++++++----
drivers/iommu/iommufd/io_pagetable.c | 200 +++++++++++-
drivers/iommu/iommufd/iommufd_private.h | 84 ++++-
drivers/iommu/iommufd/iommufd_test.h | 39 +++
drivers/{vfio => iommu/iommufd}/iova_bitmap.c | 5 +-
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/pages.c | 2 +
drivers/iommu/iommufd/selftest.c | 328 ++++++++++++++++++--
drivers/iommu/iommufd/vfio_compat.c | 6 +-
drivers/vfio/Makefile | 3 +-
drivers/vfio/pci/mlx5/Kconfig | 1 +
drivers/vfio/pci/mlx5/main.c | 1 +
drivers/vfio/pci/pds/Kconfig | 1 +
drivers/vfio/pci/pds/pci_drv.c | 1 +
drivers/vfio/vfio_main.c | 1 +
include/linux/io-pgtable.h | 4 +
include/linux/iommu.h | 146 ++++++++-
include/linux/iova_bitmap.h | 26 ++
include/uapi/linux/iommufd.h | 180 ++++++++++-
tools/testing/selftests/iommu/iommufd.c | 379 ++++++++++++++++++++++-
tools/testing/selftests/iommu/iommufd_fail_nth.c | 7 +-
tools/testing/selftests/iommu/iommufd_utils.h | 233 +++++++++++++-
36 files changed, 2723 insertions(+), 219 deletions(-)

Attachment: signature.asc
Description: PGP signature