On Wed, Jul 24, 2024 at 05:22:59PM +0800, Kunkun Jiang wrote:
On 2024/7/24 9:42, Kunkun Jiang wrote:Oh, so that stall is probably also due to b554e396e51c ("iommu: Make
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.cAdditional explanation. Background introduction:
1797 while (!queue_remove_raw(q, evt)) {
1798 u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
1799
1800 ret = arm_smmu_handle_evt(smmu, evt);
1801 if (!ret || !__ratelimit(&rs))
1802 continue;
1803
1804 dev_info(smmu->dev, "event 0x%02x
received:\n", id);
1805 for (i = 0; i < ARRAY_SIZE(evt); ++i)
1806 dev_info(smmu->dev, "\t0x%016llx\n",
1807 (unsigned long
long)evt[i]);
1808
1809 cond_resched();
1810 }
The smmu-v3 driver cannot print event information when "ret" is 0.
Unfortunately due to commit 3dfa64aecbaf
("iommu: Make iommu_report_device_fault() return void"), the default
return value in arm_smmu_handle_evt() is 0. Maybe a trace should
be added here?
1.A device(VF) is passthrough(VFIO-PCI) to a VM.
2.The SMMU has the stall feature.
3.Modified guest device driver to generate an event.
This event handling process is as follows:
arm_smmu_evtq_thread
ret = arm_smmu_handle_evt
iommu_report_device_fault
iopf_param = iopf_get_dev_fault_param(dev);
// iopf is not enabled.
// No RESUME will be sent!
if (WARN_ON(!iopf_param))
return;
if (!ret || !__ratelimit(&rs))
continue;
In this scenario, the io page-fault capability is not enabled.
There are two problems here:
1. The event information is not printed.
2. The entire device(PF level) is stalled,not just the current
VF. This affects other normal VFs.
iopf_group_response() return void"). I agree that we need a way to
propagate error handling back to the driver in the case that
'iopf_param' is NULL, otherwise we're making the unexpected fault
considerably more problematic than it needs to be.
Lu -- can we add the -ENODEV return back in the case that
iommu_report_device_fault() doesn't even find a 'iommu_fault_param' for
the device?