Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

From: Robin Murphy
Date: Thu Jul 01 2021 - 05:11:11 EST


On 2021-07-01 10:01, Will Deacon wrote:
On Thu, Jul 01, 2021 at 10:29:29AM +0200, Marek Szyprowski wrote:
Hi Robin,

On 30.06.2021 16:01, Robin Murphy wrote:
On 2021-06-30 14:48, Marek Szyprowski wrote:
On 30.06.2021 14:59, Will Deacon wrote:
On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
On 08.06.2021 18:45, Amey Narkhede wrote:
If device registration fails, remove sysfs attribute
and if setting bus callbacks fails, unregister the device
and cleanup the sysfs attribute.

Signed-off-by: Amey Narkhede <ameynarkhede03@xxxxxxxxx>
This patch landed in linux-next some time ago as commit 249c9dc6aa0d
("iommu/arm: Cleanup resources in case of probe error path"). After
bisecting and some manual searching I finally found that it is
responsible for breaking s2idle on DragonBoard 410c. Here is the log
(captured with no_console_suspend):

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan� 1 00:02:13 1970
PM: suspend entry (s2idle)
Filesystems sync: 0.002 seconds
Freezing user space processes ... (elapsed 0.006 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000070
Mem abort info:
�� � ESR = 0x96000006
�� � EC = 0x25: DABT (current EL), IL = 32 bits
�� � SET = 0, FnV = 0
�� � EA = 0, S1PTW = 0
�� � FSC = 0x06: level 2 translation fault
Data abort info:
�� � ISV = 0, ISS = 0x00000006
�� � CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000008ad08000
[0000000000000070] pgd=0800000085c3c003, p4d=0800000085c3c003,
pud=0800000088dcf003, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci
qnoc_msm8916
videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
pc : msm_runtime_suspend+0x1c/0x60 [msm]
lr : msm_pm_suspend+0x18/0x38 [msm]
...
Call trace:
�� �msm_runtime_suspend+0x1c/0x60 [msm]
�� �msm_pm_suspend+0x18/0x38 [msm]
�� �dpm_run_callback+0x84/0x378
I wonder if we're missing a pm_runtime_disable() call on the failure
path?
i.e. something like the diff below...

I've checked and it doesn't fix anything.

What's happened previously? Has an IOMMU actually failed to probe, or
is this a fiddly "code movement unveils latent bug elsewhere" kind of
thing? There doesn't look to be much capable of going wrong in
msm_runtime_suspend() itself, so is the DRM driver also in a broken
half-probed state where it's left its pm_runtime_ops behind without
its drvdata being valid?

I finally had some time to analyze this issue. It turned out that with
this patch, iommu fails to probe for soc:iommu@1f08000 device, while it
worked fine before. This happens because this patch adds a check for the
return value of the bus_set_iommu() in
drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it
probes successfully again. It looks that there are already iommu ops
registered for platform bus, before qcom_iommu probes. On the other
hand, if I remember correctly they are not used during the device
registration, but they are needed for some legacy stuff. I can send a
patch restoring old code flow if you think that this is a right solution.

Yes, let's just revert the qcom_iommu.c changes from that patch for now.
The pm runtime stuff looks dodgy anyway so I think this needs more thought.

Oh, right, blindly returning the -EBUSY from bus_set_iommu() because we're not the first instance to probe is definitely the wrong thing to do as well. It's still not clear why failing makes the DRM driver fall over, but +1 to qcom-iommu needing some deeper consideration.

Robin.