Re: [PATCH v5 0/5] iommu: Retire detach_dev callback
From: Steven Price
Date: Wed Mar 15 2023 - 12:36:19 EST
On 15/03/2023 15:57, Robin Murphy wrote:
> On 2023-03-15 15:49, Steven Price wrote:
>> On 10/01/2023 02:54, Lu Baolu wrote:
>>> Hi folks,
>>>
>>> The iommu core calls the driver's detach_dev domain op callback only
>>> when
>>> a device is finished assigning to user space and
>>> iommu_group_release_dma_owner() is called to return the device to the
>>> kernel, where iommu core wants to set the default domain to the
>>> device but
>>> the driver didn't provide one. The code looks like:
>>>
>>> /*
>>> * New drivers should support default domains and so the
>>> detach_dev() op
>>> * will never be called. Otherwise the NULL domain
>>> represents some
>>> * platform specific behavior.
>>> */
>>> if (!new_domain) {
>>> if (WARN_ON(!group->domain->ops->detach_dev))
>>> return -EINVAL;
>>> __iommu_group_for_each_dev(group, group->domain,
>>>
>>> iommu_group_do_detach_device);
>>> group->domain = NULL;
>>> return 0;
>>> }
>>>
>>> In other words, if the iommu driver provides default domains, the
>>> .detach_dev callback will never be called; Otherwise, the .detach_dev
>>> callback is actually called to return control back to the platform DMA
>>> ops, other than detaching the domain from device.
>>>
>>> This series cleanups this by:
>>>
>>> - If the IOMMU driver provides default domains, remove .detach_dev
>>> callback.
>>> - Adds a new set_platform_dma iommu op. Any IOMMU driver that doesn't
>>> provide default domain should implement set_platform_dma callback
>>> instead.
>>> - Retire .detach_dev callback.
>>>
>>> This series originates from various discussion in the community. Thanks
>>> to Jason, Robin and all others for their ideas.
>>>
>>> The whole series is available on github:
>>> https://github.com/LuBaolu/intel-iommu/commits/iommu-retire-detach_dev-v5
>>>
>>> Change log:
>>> v5:
>>> - Merge some patches to make the series cute. No functionality
>>> changes.
>>> - Check the op directly and WARN_ON the lack of any necessary
>>> callbacks. Get rid of the ret and EINVAL.
>>>
>>> v4:
>>> -
>>> https://lore.kernel.org/linux-iommu/20230104125725.271850-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - Drop the patch which renamed .attach_dev to .set_dev. As Robin said,
>>> "... I also wouldn't say that "attach" loses its meaning in a
>>> context
>>> where an equivalent "detach" operation is only ever implicit in
>>> reattaching to something else...". If we have a better name in the
>>> future, we can do it in other series.
>>> - Adjust the patch of "iommu: Add set_platform_dma_ops iommu ops"
>>> according to Jason's following suggestion " ... This is a bit ugly,
>>> it would be better to make the previous patch call set_platform_dma
>>> if it is set instead of detach_dev and then these patches should
>>> just
>>> rename the driver's fsl_pamu_detach_device to
>>> fsl_pamu_set_platform_dma ..."
>>> - Add a new patch to remove deferred attach check from
>>> __iommu_detach_domain() path. Make it a separate patch as the
>>> prerequisite to remove __iommu_detach_device() helper.
>>> - Rename set_platform_dma to set_platform_dma_ops to make it more
>>> meaningful.
>>>
>>> v3:
>>> -
>>> https://lore.kernel.org/linux-iommu/20221128064648.1934720-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - Setting blocking domain is not conceptually equal to detach_dev.
>>> Dropped all blocking domain related staffs in the previous version.
>>>
>>> v2:
>>> -
>>> https://lore.kernel.org/linux-iommu/20220826123014.52709-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - Replace .detach_dev callback with static block domain ops;
>>> - Rename .attach_dev to .set_dev.
>>>
>>> v1:
>>> -
>>> https://lore.kernel.org/linux-iommu/20220516015759.2952771-1-baolu.lu@xxxxxxxxxxxxxxx/
>>>
>>> Jason Gunthorpe (1):
>>> iommu: Remove deferred attach check from __iommu_detach_device()
>>>
>>> Lu Baolu (4):
>>> iommu: Remove detach_dev callbacks
>>> iommu: Add set_platform_dma_ops iommu ops
>>> iommu: Add set_platform_dma_ops callbacks
>>> iommu: Remove detach_dev callback
>>>
>>> include/linux/iommu.h | 8 ++-
>>> include/trace/events/iommu.h | 7 --
>>> drivers/iommu/amd/iommu.c | 26 -------
>>> drivers/iommu/apple-dart.c | 24 -------
>>> drivers/iommu/arm/arm-smmu/qcom_iommu.c | 23 ------
>>> drivers/iommu/exynos-iommu.c | 1 -
>>> drivers/iommu/fsl_pamu_domain.c | 6 +-
>>> drivers/iommu/iommu-traces.c | 1 -
>>> drivers/iommu/iommu.c | 94 ++++++++++++-------------
>>> drivers/iommu/ipmmu-vmsa.c | 16 -----
>>> drivers/iommu/msm_iommu.c | 6 +-
>>> drivers/iommu/mtk_iommu.c | 9 ---
>>> drivers/iommu/mtk_iommu_v1.c | 4 +-
>>> drivers/iommu/omap-iommu.c | 6 +-
>>> drivers/iommu/rockchip-iommu.c | 1 -
>>> drivers/iommu/s390-iommu.c | 7 +-
>>> drivers/iommu/sprd-iommu.c | 16 -----
>>> drivers/iommu/sun50i-iommu.c | 1 -
>>> drivers/iommu/tegra-gart.c | 6 +-
>>> drivers/iommu/tegra-smmu.c | 5 +-
>>> 20 files changed, 69 insertions(+), 198 deletions(-)
>>>
>>
>> I hit a problem with this series on a Firefly-RK3288, I bisected down to
>> 1b932ceddd19 ("iommu: Remove detach_dev callbacks"). The first splat is:
>>
>> [ 7.227853] ------------[ cut here ]------------
>> [ 7.227900] WARNING: CPU: 0 PID: 9 at drivers/iommu/iommu.c:2198
>> __iommu_group_set_domain+0xb4/0xc8
>> [ 7.227920] Modules linked in:
>> [ 7.227935] CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc1 #1
>> [ 7.227942] Hardware name: Rockchip (Device Tree)
>> [ 7.227948] Workqueue: events_unbound deferred_probe_work_func
>> [ 7.227964] unwind_backtrace from show_stack+0x10/0x14
>> [ 7.227978] show_stack from dump_stack_lvl+0x58/0x70
>> [ 7.227992] dump_stack_lvl from __warn+0x7c/0x1dc
>> [ 7.228008] __warn from warn_slowpath_fmt+0xbc/0x1a0
>> [ 7.228022] warn_slowpath_fmt from __iommu_group_set_domain+0xb4/0xc8
>> [ 7.228035] __iommu_group_set_domain from
>> iommu_detach_device+0x84/0xf8
>> [ 7.228046] iommu_detach_device from
>> arm_iommu_detach_device+0x24/0xc4
>> [ 7.228057] arm_iommu_detach_device from
>> rockchip_drm_dma_attach_device+0x30/0x74
>> [ 7.228073] rockchip_drm_dma_attach_device from
>> vop_crtc_atomic_enable+0xf8/0xab0
>> [ 7.228085] vop_crtc_atomic_enable from
>> drm_atomic_helper_commit_modeset_enables+0xb0/0x2a0
>> [ 7.228097] drm_atomic_helper_commit_modeset_enables from
>> drm_atomic_helper_commit_tail_rpm+0x44/0x8c
>> [ 7.228111] drm_atomic_helper_commit_tail_rpm from
>> commit_tail+0x9c/0x184
>> [ 7.228124] commit_tail from drm_atomic_helper_commit+0x164/0x18c
>> [ 7.228137] drm_atomic_helper_commit from drm_atomic_commit+0xb0/0xe8
>> [ 7.228153] drm_atomic_commit from
>> drm_client_modeset_commit_atomic+0x240/0x288
>> [ 7.228166] drm_client_modeset_commit_atomic from
>> drm_client_modeset_commit_locked+0x60/0x1cc
>> [ 7.228174] drm_client_modeset_commit_locked from
>> drm_client_modeset_commit+0x24/0x40
>> [ 7.228183] drm_client_modeset_commit from
>> drm_fb_helper_set_par+0xb8/0xf8
>> [ 7.228197] drm_fb_helper_set_par from fbcon_init+0x2a0/0x534
>> [ 7.228211] fbcon_init from visual_init+0xc0/0x108
>> [ 7.228224] visual_init from do_bind_con_driver+0x1bc/0x3a8
>> [ 7.228237] do_bind_con_driver from do_take_over_console+0x134/0x1d4
>> [ 7.228251] do_take_over_console from do_fbcon_takeover+0x6c/0xcc
>> [ 7.228264] do_fbcon_takeover from fbcon_fb_registered+0x198/0x1a8
>> [ 7.228277] fbcon_fb_registered from register_framebuffer+0x1d0/0x268
>> [ 7.228292] register_framebuffer from
>> __drm_fb_helper_initial_config_and_unlock+0x358/0x578
>> [ 7.228308] __drm_fb_helper_initial_config_and_unlock from
>> drm_fbdev_client_hotplug+0x6c/0xa8
>> [ 7.228322] drm_fbdev_client_hotplug from
>> drm_fbdev_generic_setup+0x84/0x174
>> [ 7.228335] drm_fbdev_generic_setup from
>> rockchip_drm_bind+0x1dc/0x200
>> [ 7.228349] rockchip_drm_bind from
>> try_to_bring_up_aggregate_device+0x200/0x2d8
>> [ 7.228368] try_to_bring_up_aggregate_device from
>> component_master_add_with_match+0xc4/0xf8
>> [ 7.228380] component_master_add_with_match from
>> rockchip_drm_platform_probe+0x150/0x254
>> [ 7.228392] rockchip_drm_platform_probe from platform_probe+0x5c/0xb8
>> [ 7.228405] platform_probe from really_probe+0xe0/0x400
>> [ 7.228417] really_probe from __driver_probe_device+0x9c/0x1fc
>> [ 7.228431] __driver_probe_device from driver_probe_device+0x30/0xc0
>> [ 7.228445] driver_probe_device from
>> __device_attach_driver+0xa8/0x120
>> [ 7.228460] __device_attach_driver from bus_for_each_drv+0x84/0xdc
>> [ 7.228474] bus_for_each_drv from __device_attach+0xb0/0x20c
>> [ 7.228488] __device_attach from bus_probe_device+0x8c/0x90
>> [ 7.228502] bus_probe_device from deferred_probe_work_func+0x98/0xe0
>> [ 7.228515] deferred_probe_work_func from
>> process_one_work+0x290/0x740
>> [ 7.228528] process_one_work from worker_thread+0x54/0x518
>> [ 7.228536] worker_thread from kthread+0xf0/0x110
>> [ 7.228548] kthread from ret_from_fork+0x14/0x2c
>> [ 7.228561] Exception stack(0xf084dfb0 to 0xf084dff8)
>> [ 7.228567] dfa0: 00000000
>> 00000000 00000000 00000000
>> [ 7.228573] dfc0: 00000000 00000000 00000000 00000000 00000000
>> 00000000 00000000 00000000
>> [ 7.228579] dfe0: 00000000 00000000 00000000 00000000 00000013
>> 00000000
>> [ 7.228585] irq event stamp: 138379
>> [ 7.228592] hardirqs last enabled at (138385): [<c03c42ac>]
>> vprintk_emit+0x330/0x354
>> [ 7.228606] hardirqs last disabled at (138390): [<c03c4268>]
>> vprintk_emit+0x2ec/0x354
>> [ 7.228617] softirqs last enabled at (137258): [<c03016ac>]
>> __do_softirq+0x2f8/0x548
>> [ 7.228628] softirqs last disabled at (137253): [<c0350218>]
>> __irq_exit_rcu+0x14c/0x170
>> [ 7.228639] ---[ end trace 0000000000000000 ]---
>>
>> (complete log attached)
>>
>> I'm not sure how to debug this. From the callstack I can see that
>> __iommu_group_set_domain is getting a NULL new_domain which triggers
>> a call to __iommu_group_do_set_platform_dma which is WARNing because
>> there is no set_platform_dma_ops callback. The NULL new_domain is
>> because group->default_domain is NULL in __iommu_group_set_core_domain.
>>
>> So is the logic in the first patch that there is a default domain
>> incorrect for this rockchip driver? Or is this callpath just hitting
>> the function before the default domain can be created?
>>
>> Help debugging this would be appreciated - I'm not very familiar
>> with this area of code.
>
> It's the same thing Marek hit on Exynos: drivers which run on 32-bit Arm
> need a .set_platform_dma op, regardless of whether they support default
> domains on other platforms which use those.
Ah, that seems to be it. I'll post a patch.
Thanks,
Steve