Re: [bug report] iommu/arm-smmu-v3: Event cannot be printed in some scenarios

From: Baolu Lu
Date: Mon Aug 05 2024 - 20:09:42 EST


On 2024/8/5 23:32, Pranjal Shrivastava wrote:
On Mon, Aug 05, 2024 at 01:30:01PM +0100, Will Deacon wrote:
On Mon, Aug 05, 2024 at 08:13:09PM +0800, Kunkun Jiang wrote:
On 2024/8/2 22:38, Pranjal Shrivastava wrote:
Hey,
On Mon, Jul 29, 2024 at 11:02 AM Baolu Lu<baolu.lu@xxxxxxxxxxxxxxx> wrote:
On 2024/7/24 18:24, Will Deacon wrote:
On Wed, Jul 24, 2024 at 05:22:59PM +0800, Kunkun Jiang wrote:
On 2024/7/24 9:42, Kunkun Jiang wrote:
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
1797 while (!queue_remove_raw(q, evt)) {
1798 u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
1799
1800 ret = arm_smmu_handle_evt(smmu, evt);
1801 if (!ret || !__ratelimit(&rs))
1802 continue;
1803
1804 dev_info(smmu->dev, "event 0x%02x
received:\n", id);
1805 for (i = 0; i < ARRAY_SIZE(evt); ++i)
1806 dev_info(smmu->dev, "\t0x%016llx\n",
1807 (unsigned long
long)evt[i]);
1808
1809 cond_resched();
1810 }

The smmu-v3 driver cannot print event information when "ret" is 0.
Unfortunately due to commit 3dfa64aecbaf
("iommu: Make iommu_report_device_fault() return void"), the default
return value in arm_smmu_handle_evt() is 0. Maybe a trace should
be added here?
Additional explanation. Background introduction:
1.A device(VF) is passthrough(VFIO-PCI) to a VM.
2.The SMMU has the stall feature.
3.Modified guest device driver to generate an event.

This event handling process is as follows:
arm_smmu_evtq_thread
ret = arm_smmu_handle_evt
iommu_report_device_fault
iopf_param = iopf_get_dev_fault_param(dev);
// iopf is not enabled.
// No RESUME will be sent!
if (WARN_ON(!iopf_param))
return;
if (!ret || !__ratelimit(&rs))
continue;

In this scenario, the io page-fault capability is not enabled.
There are two problems here:
1. The event information is not printed.
2. The entire device(PF level) is stalled,not just the current
VF. This affects other normal VFs.
Oh, so that stall is probably also due to b554e396e51c ("iommu: Make
iopf_group_response() return void"). I agree that we need a way to
propagate error handling back to the driver in the case that
'iopf_param' is NULL, otherwise we're making the unexpected fault
considerably more problematic than it needs to be.

Lu -- can we add the -ENODEV return back in the case that
iommu_report_device_fault() doesn't even find a 'iommu_fault_param' for
the device?
Yes, of course. The commit b554e396e51c was added to consolidate the
drivers' auto response code in the core with the assumption that driver
only needs to call iommu_report_device_fault() for reporting an iopf.

I had a go at taking Jason's diff and implementing the suggestions in
this thread.
Kunkun -- please can you see if this fixes the problem for you?
Okay, I'll test it as soon as I can.
It looks like the diff sent by Pranjal has whitespace mangling, so I
don't think you'll be able to apply it.

Pranjal -- please can you send an unmangled version? If you want to test
out your mail setup, I'm happy to be a guinea pig so you don't spam the
mailing lists!
Ugh, apologies for that, something went wrong with my client.
Kunkun -- Please let me know if this fixes the problem.
Lu -- It looks like the intel->page_response callback doesn't expect a
NULL event, so, for now, I immediately return in that case. LMK what you
think?

That's okay. We had such check there before the refactoring.

Thanks,
baolu