Re: [PATCH 00/12] iommu: Remove IOMMU_DEV_FEAT_SVA/_IOPF

From: Baolu Lu
Date: Mon Feb 17 2025 - 22:00:46 EST


On 2/15/25 19:35, Zhangfei Gao wrote:
On Sat, 15 Feb 2025 at 18:09, Baolu Lu<baolu.lu@xxxxxxxxxxxxxxx> wrote:
On 2/15/25 16:11, Zhangfei Gao wrote:
It does not relate to multi devices, one device also happens when user
page fault triggers.

iopf_queue_remove_device is called.
rcu_assign_pointer(param->fault_param, NULL);

call trace
[ 304.961312] Call trace:
[ 304.961314] show_stack+0x20/0x38 (C)
[ 304.961319] dump_stack_lvl+0xc0/0xd0
[ 304.961324] dump_stack+0x18/0x28
[ 304.961327] iopf_queue_remove_device+0xb0/0x1f0
[ 304.961331] arm_smmu_remove_master_domain+0x204/0x250
[ 304.961336] arm_smmu_attach_commit+0x64/0x100
[ 304.961338] arm_smmu_attach_dev_nested+0x104/0x1a8
[ 304.961340] __iommu_attach_device+0x2c/0x110
[ 304.961343] __iommu_device_set_domain.isra.0+0x78/0xe0
[ 304.961345] __iommu_group_set_domain_internal+0x78/0x160
[ 304.961347] iommu_replace_group_handle+0x9c/0x150
[ 304.961350] iommufd_fault_domain_replace_dev+0x88/0x120
[ 304.961353] iommufd_device_do_replace+0x190/0x3c0
[ 304.961355] iommufd_device_change_pt+0x270/0x688
[ 304.961357] iommufd_device_replace+0x20/0x38
[ 304.961359] vfio_iommufd_physical_attach_ioas+0x30/0x78
[ 304.961363] vfio_df_ioctl_attach_pt+0xa8/0x188
[ 304.961366] vfio_device_fops_unl_ioctl+0x310/0x990


When page fault triggers:

[ 1016.383578] ------------[ cut here ]-----------
[ 1016.388184] WARNING: CPU: 35 PID: 717 at
drivers/iommu/io-pgfault.c:231 iommu_report_device_fault+0x2c8/0x470
It's likely that iopf_queue_add_device() was not called for this device.
iopf_queue_add_device is called, but quickly iopf_queue_remove_device
is called during guest bootup.
Then fault_param is set to NULL.

arm_smmu_attach_commit
arm_smmu_remove_master_domain
// newly added in the first patch
if (master_domain) {
if (master_domain->using_iopf)

It seems the above check is incorrect. We only need to disable iopf when
an iopf-capable domain is about to be removed. Will the following
additional change make any difference?

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 28e67a9e3861..9b9ef738d070 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2822,7 +2822,7 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master,
spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);

if (master_domain) {
- if (master_domain->using_iopf)
+ if (domain->iopf_handler)
arm_smmu_disable_iopf(master);
kfree(master_domain);
}

arm_smmu_disable_iopf(master); ->
iopf_queue_remove_device
kfree(master_domain);
}

As a comparison, without this patchset, only iopf_queue_add_device is
called, not call iopf_queue_remove_device

Thanks,
baolu