Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

From: Marek Szyprowski
Date: Thu Jul 01 2021 - 04:29:38 EST


Hi Robin,

On 30.06.2021 16:01, Robin Murphy wrote:
> On 2021-06-30 14:48, Marek Szyprowski wrote:
>> On 30.06.2021 14:59, Will Deacon wrote:
>>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>>>> On 08.06.2021 18:45, Amey Narkhede wrote:
>>>>> If device registration fails, remove sysfs attribute
>>>>> and if setting bus callbacks fails, unregister the device
>>>>> and cleanup the sysfs attribute.
>>>>>
>>>>> Signed-off-by: Amey Narkhede <ameynarkhede03@xxxxxxxxx>
>>>> This patch landed in linux-next some time ago as commit 249c9dc6aa0d
>>>> ("iommu/arm: Cleanup resources in case of probe error path"). After
>>>> bisecting and some manual searching I finally found that it is
>>>> responsible for breaking s2idle on DragonBoard 410c. Here is the log
>>>> (captured with no_console_suspend):
>>>>
>>>> # time rtcwake -s10 -mmem
>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:02:13 1970
>>>> PM: suspend entry (s2idle)
>>>> Filesystems sync: 0.002 seconds
>>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>>> OOM killer disabled.
>>>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 0000000000000070
>>>> Mem abort info:
>>>>      ESR = 0x96000006
>>>>      EC = 0x25: DABT (current EL), IL = 32 bits
>>>>      SET = 0, FnV = 0
>>>>      EA = 0, S1PTW = 0
>>>>      FSC = 0x06: level 2 translation fault
>>>> Data abort info:
>>>>      ISV = 0, ISS = 0x00000006
>>>>      CM = 0, WnR = 0
>>>> user pgtable: 4k pages, 48-bit VAs, pgdp=000000008ad08000
>>>> [0000000000000070] pgd=0800000085c3c003, p4d=0800000085c3c003,
>>>> pud=0800000088dcf003, pmd=0000000000000000
>>>> Internal error: Oops: 96000006 [#1] PREEMPT SMP
>>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>>>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>>>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>>>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>>>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>>>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
>>>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>>>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci
>>>> qnoc_msm8916
>>>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
>>>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
>>>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>> pc : msm_runtime_suspend+0x1c/0x60 [msm]
>>>> lr : msm_pm_suspend+0x18/0x38 [msm]
>>>> ...
>>>> Call trace:
>>>>     msm_runtime_suspend+0x1c/0x60 [msm]
>>>>     msm_pm_suspend+0x18/0x38 [msm]
>>>>     dpm_run_callback+0x84/0x378
>>> I wonder if we're missing a pm_runtime_disable() call on the failure
>>> path?
>>> i.e. something like the diff below...
>>
>> I've checked and it doesn't fix anything.
>
> What's happened previously? Has an IOMMU actually failed to probe, or
> is this a fiddly "code movement unveils latent bug elsewhere" kind of
> thing? There doesn't look to be much capable of going wrong in
> msm_runtime_suspend() itself, so is the DRM driver also in a broken
> half-probed state where it's left its pm_runtime_ops behind without
> its drvdata being valid?
>
I finally had some time to analyze this issue. It turned out that with
this patch, iommu fails to probe for soc:iommu@1f08000 device, while it
worked fine before. This happens because this patch adds a check for the
return value of the bus_set_iommu() in
drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it
probes successfully again. It looks that there are already iommu ops
registered for platform bus, before qcom_iommu probes. On the other
hand, if I remember correctly they are not used during the device
registration, but they are needed for some legacy stuff. I can send a
patch restoring old code flow if you think that this is a right solution.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland