Re: [bug report] iommu/arm-smmu-v3: Event cannot be printed in some scenarios

From: Baolu Lu
Date: Mon Jul 29 2024 - 01:32:45 EST


On 2024/7/24 18:24, Will Deacon wrote:
On Wed, Jul 24, 2024 at 05:22:59PM +0800, Kunkun Jiang wrote:
On 2024/7/24 9:42, Kunkun Jiang wrote:
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
1797                 while (!queue_remove_raw(q, evt)) {
1798                         u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
1799
1800                         ret = arm_smmu_handle_evt(smmu, evt);
1801                         if (!ret || !__ratelimit(&rs))
1802                                 continue;
1803
1804                         dev_info(smmu->dev, "event 0x%02x
received:\n", id);
1805                         for (i = 0; i < ARRAY_SIZE(evt); ++i)
1806                                 dev_info(smmu->dev, "\t0x%016llx\n",
1807                                          (unsigned long
long)evt[i]);
1808
1809                         cond_resched();
1810                 }

The smmu-v3 driver cannot print event information when "ret" is 0.
Unfortunately due to commit 3dfa64aecbaf
("iommu: Make iommu_report_device_fault() return void"), the default
return value in arm_smmu_handle_evt() is 0. Maybe a trace should
be added here?
Additional explanation. Background introduction:
1.A device(VF) is passthrough(VFIO-PCI) to a VM.
2.The SMMU has the stall feature.
3.Modified guest device driver to generate an event.

This event handling process is as follows:
arm_smmu_evtq_thread
    ret = arm_smmu_handle_evt
        iommu_report_device_fault
            iopf_param = iopf_get_dev_fault_param(dev);
            // iopf is not enabled.
// No RESUME will be sent!
            if (WARN_ON(!iopf_param))
                return;
    if (!ret || !__ratelimit(&rs))
        continue;

In this scenario, the io page-fault capability is not enabled.
There are two problems here:
1. The event information is not printed.
2. The entire device(PF level) is stalled,not just the current
VF. This affects other normal VFs.
Oh, so that stall is probably also due to b554e396e51c ("iommu: Make
iopf_group_response() return void"). I agree that we need a way to
propagate error handling back to the driver in the case that
'iopf_param' is NULL, otherwise we're making the unexpected fault
considerably more problematic than it needs to be.

Lu -- can we add the -ENODEV return back in the case that
iommu_report_device_fault() doesn't even find a 'iommu_fault_param' for
the device?

Yes, of course. The commit b554e396e51c was added to consolidate the
drivers' auto response code in the core with the assumption that driver
only needs to call iommu_report_device_fault() for reporting an iopf.

Thanks,
baolu