Re: [PATCH v2] iommu/mediatek: Fix NULL pointer deference in mtk_iommu_device_group

From: Robin Murphy
Date: Mon Apr 07 2025 - 08:41:47 EST


On 2025-04-07 6:17 am, Chen-Yu Tsai wrote:
Hi,

On Thu, Apr 3, 2025 at 6:24 PM Louis-Alexis Eyraud
<louisalexis.eyraud@xxxxxxxxxxxxx> wrote:

Currently, mtk_iommu calls during probe iommu_device_register before
the hw_list from driver data is initialized. Since iommu probing issue
fix, it leads to NULL pointer dereference in mtk_iommu_device_group when
hw_list is accessed with list_first_entry (not null safe).

So, change the call order to ensure iommu_device_register is called
after the driver data are initialized.

Fixes: 9e3a2a643653 ("iommu/mediatek: Adapt sharing and non-sharing pgtable case")
Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
Reviewed-by: Yong Wu <yong.wu@xxxxxxxxxxxx>
Tested-by: Chen-Yu Tsai <wenst@xxxxxxxxxxxx> # MT8183 Juniper, MT8186 Tentacruel
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@xxxxxxxxxxxxx>
---
This patch fixes a NULL pointer dereference that occurs during the
mtk_iommu driver probe and observed at least on several Mediatek Genio boards:
```
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000

This is a reminder to please land this and send to Linus ASAP.

This fixes the v6.15-rc1 kernel on all the MediaTek Chromebook platforms,
except for MT8188, which seems to have another issue in iommu_get_dma_domain()
used from the DRM driver:

Disabling lock debugging due to kernel taint
Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000158

From the offset and the stacktrace code dump, this would appear to be the dereference of dev->iommu_group->default_domain, when dev->iommu_group is NULL (and CONFIG_DEBUG_LOCK_ALLOC makes the mutex really big). Which is a bit weird, as to get into iommu-dma at all in that state would suggest that whatever device this is has been removed and had its group torn down again after iommu_setup_dma_ops() has run... but either way that implies the DRM driver is passing an arbitrary device to the DMA API without making sure it's actualy valid.

Trying to trace the provenance of dma_dev from mtk_gem_create() back through the rest of the driver is quite the rabbit-hole, but it seems like in at least one case it can lead back to an of_find_device_by_node() in ovl_adaptor_comp_init(), which definitely looks sufficiently sketchy.

Thanks,
Robin.

Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=00000001185ab000
[0000000000000158] pgd=0000000000000000, p4d=0000000000000000,
pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] SMP
Modules linked in: mtk_vcodec_dec_hw mtk_vcodec_dec mtk_vcodec_enc
v4l2_vp9 v4l2_h264 mtk_vcodec_dbgfs mtk_jpeg mtk_vcodec_common
cros_ec_sensors mtk_jpeg_enc_hw cros_ec_sensors_core mtk_jpeg_dec_hw
mtk_vpu v4l2_mem2mem videobuf2_v4l2 snd_sof_mt8186
videobuf2_dma_contig snd_sof_xtensa_dsp sha1_ce videobuf2_memops
mtk_adsp_common mtk_scp videobuf2_common snd_sof_of mtk_rpmsg snd_sof
rpmsg_core cros_ec_sensorhub hid_google_hammer hid_vivaldi_common
snd_sof_utils mtk_scp_ipi fuse
CPU: 6 UID: 0 PID: 12 Comm: kworker/u32:0 Tainted: G B
6.15.0-rc1-00001-gfed05d98b726 #628 PREEMPT
06e695da6360d22824958738f9ba1c9b2416be19
Tainted: [B]=BAD_PAGE
Hardware name: Google Ciri sku2 board (DT)
Workqueue: events_unbound deferred_probe_work_func
pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : iommu_get_dma_domain+0x30/0x58
lr : iommu_get_dma_domain+0x30/0x58
sp : ffffffc080186ee0
x29: ffffffc080186ee0 x28: ffffff80c4c59028 x27: 0000000000000000
x26: 0000000000000014 x25: 0068000000000f0b x24: ffffff80c4c5ca28
x23: 00000000008ca000 x22: 1ffffff810030df8 x21: ffffff80c20ea010
x20: ffffff80c20ea010 x19: 0000000000000000 x18: 0000000000000000
x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 0720072007200720
x14: 0720072007200720 x13: ffffff80c0a1bb80 x12: ffffffbae2ca45e9
x11: 1ffffffae2ca45e8 x10: ffffffbae2ca45e8 x9 : dfffffc000000000
x8 : 000000451d35ba18 x7 : ffffffd716522f47 x6 : 0000000000000001
x5 : ffffffd716522f40 x4 : ffffffbae2ca45e9 x3 : ffffffd713717e6c
x2 : 0000000000000001 x1 : ffffff80c0941dc0 x0 : 0000000000000001
Call trace:
iommu_get_dma_domain+0x30/0x58 (P)
__iommu_dma_alloc_noncontiguous+0x34/0x498
iommu_dma_alloc+0x2f0/0x3e0
dma_alloc_attrs+0x1b4/0x3b8
mtk_gem_create+0x124/0x170
mtk_gem_dumb_create+0x84/0x180
drm_mode_create_dumb+0xf8/0x128
drm_client_framebuffer_create+0x11c/0x240
drm_fbdev_dma_driver_fbdev_probe+0x154/0x5a0
__drm_fb_helper_initial_config_and_unlock+0x4f0/0x928
drm_fb_helper_initial_config+0x50/0x68
drm_fbdev_client_hotplug+0xc0/0x120
drm_client_register+0xa0/0x100
drm_fbdev_client_setup+0xc8/0x260
drm_client_setup+0x60/0xd0
mtk_drm_bind+0x4f0/0xaf0
try_to_bring_up_aggregate_device+0x258/0x2f0
__component_add+0x104/0x240
component_add+0x1c/0x38
mtk_disp_rdma_probe+0x180/0x260
platform_probe+0x98/0x128
really_probe+0x118/0x3c0
__driver_probe_device+0xc0/0x198
driver_probe_device+0x64/0x1f8
__device_attach_driver+0xf0/0x1b0
bus_for_each_drv+0xf4/0x178
__device_attach+0x120/0x240
device_initial_probe+0x1c/0x30
bus_probe_device+0xdc/0xe8
deferred_probe_work_func+0xec/0x140
process_one_work+0x428/0xa80
worker_thread+0x2c0/0x538
kthread+0x258/0x380
ret_from_fork+0x10/0x20
Code: 97e8f12c f9422673 91056260 97e8f129 (f940ae60)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: 0x1693600000 from 0xffffffc080000000
PHYS_OFFSET: 0x40000000
CPU features: 0x0e00,000002e0,01202650,8200720b
Memory Limit: none
Rebooting in 30 seconds..



Thanks
ChenYu

Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101380000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: btusb btrtl mt6315_regulator btintel mtk_vcodec_dec
btmtk v4l2_vp9 btbcm mtk_vcodec_enc v4l2_h264 mtk_vcodec_dbgfs
bluetooth mtk_jpeg ecdh_generic mtk_vcodec_common mtk_jpeg_enc_hw
mtk_jpeg_dec_hw ecc v4l2_mem2mem rfkill videobuf2_dma_contig
videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc
mt6360_charger mcp251xfd it5205 goodix_ts can_dev panfrost
drm_shmem_helper gpu_sched lvts_thermal mtk_svs mtk_adsp_mailbox
snd_soc_dmic mtk_scp mtk_rpmsg mtk_scp_ipi mtk_cmdq_mailbox
mediatek_cpufreq_hw spmi_mtk_pmif fuse dm_mod ip_tables x_tables
ipv6 mediatek_drm tcpci_mt6360 drm_display_helper drm_client_lib
tcpci drm_dma_helper drm_kms_helper tcpm drm mtk_mutex mtk_mmsys
typec rtc_mt6397 mtk_cmdq_helper phy_mtk_pcie pcie_mediatek_gen3
dwmac_mediatek stmmac_platform stmmac pwm_mtk_disp pcs_xpcs pwm_bl
backlight
CPU: 5 UID: 0 PID: 12 Comm: kworker/u32:0 Not tainted 6.14.0-next-20250326 #4 PREEMPT
Hardware name: MediaTek Genio 1200 EVK-P1V2-EMMC (DT)
Workqueue: events_unbound deferred_probe_work_func
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mtk_iommu_device_group+0x2c/0xe0
lr : __iommu_probe_device+0x130/0x490
sp : ffff8000827b3970
x29: ffff8000827b3970 x28: ffff0000c0028000 x27: ffff80008164f2b8
x26: ffff0000cbc47130 x25: ffff80008096a9f0 x24: ffff80008164f5a8
x23: ffff800082684b60 x22: ffff8000827b3a80 x21: ffff0000cbc47130
x20: ffffffffffffffed x19: ffff0000c117c010 x18: 0000000000000000
x17: 6f702d616d642d64 x16: 6574636972747365 x15: 0000000000000002
x14: 0000000000000000 x13: 0000000000128d55 x12: 6f632d616d642e30
x11: 0000000000000100 x10: 0000000000000001 x9 : 0000000000000220
x8 : 0101010101010101 x7 : ffff0000c117c010 x6 : 306c766f2c727461
x5 : 0000000000000000 x4 : ffff0000c10a2de8 x3 : ffff0000c10a2e70
x2 : ffff0000c01a4600 x1 : 0000000000000000 x0 : ffff0000c65470c0
Call trace:
mtk_iommu_device_group+0x2c/0xe0 (P)
__iommu_probe_device+0x130/0x490
probe_iommu_group+0x3c/0x70
bus_for_each_dev+0x7c/0xe0
iommu_device_register+0xd8/0x22c
mtk_iommu_probe+0x270/0x53c
platform_probe+0x68/0xd8
really_probe+0xbc/0x2c0
__driver_probe_device+0x78/0x120
driver_probe_device+0x3c/0x154
__device_attach_driver+0xb8/0x140
bus_for_each_drv+0x88/0xe8
__device_attach+0xa0/0x190
device_initial_probe+0x14/0x20
bus_probe_device+0xb4/0xc0
deferred_probe_work_func+0x90/0xc8
process_one_work+0x148/0x284
worker_thread+0x2cc/0x3cc
kthread+0x12c/0x204
ret_from_fork+0x10/0x20
Code: b4000500 f9401c01 92800254 f9409821 (f9400035)
---[ end trace 0000000000000000 ]---
```

I've tested this patch on Mediatek Genio 510-EVK and 1200-EVK boards
with a kernel based on linux-next (tag: next-20250327).
---
Changes in v2:
- Fix goto label usage in device registration error case
- Add review and test trailers
- Link to v1: https://lore.kernel.org/r/20250327-fix-mtk-iommu-error-v1-1-df969158e752@xxxxxxxxxxxxx
---
drivers/iommu/mtk_iommu.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 034b0e670384a24df10130cbbff95ce8e0bc092d..df98d0c65f5469c6803cd9d151c85ad855558cf5 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1372,15 +1372,6 @@ static int mtk_iommu_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, data);
mutex_init(&data->mutex);

- ret = iommu_device_sysfs_add(&data->iommu, dev, NULL,
- "mtk-iommu.%pa", &ioaddr);
- if (ret)
- goto out_link_remove;
-
- ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
- if (ret)
- goto out_sysfs_remove;
-
if (MTK_IOMMU_HAS_FLAG(data->plat_data, SHARE_PGTABLE)) {
list_add_tail(&data->list, data->plat_data->hw_list);
data->hw_list = data->plat_data->hw_list;
@@ -1390,19 +1381,28 @@ static int mtk_iommu_probe(struct platform_device *pdev)
data->hw_list = &data->hw_list_head;
}

+ ret = iommu_device_sysfs_add(&data->iommu, dev, NULL,
+ "mtk-iommu.%pa", &ioaddr);
+ if (ret)
+ goto out_list_del;
+
+ ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
+ if (ret)
+ goto out_sysfs_remove;
+
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
ret = component_master_add_with_match(dev, &mtk_iommu_com_ops, match);
if (ret)
- goto out_list_del;
+ goto out_device_unregister;
}
return ret;

-out_list_del:
- list_del(&data->list);
+out_device_unregister:
iommu_device_unregister(&data->iommu);
out_sysfs_remove:
iommu_device_sysfs_remove(&data->iommu);
-out_link_remove:
+out_list_del:
+ list_del(&data->list);
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM))
device_link_remove(data->smicomm_dev, dev);
out_runtime_disable:

---
base-commit: bc960e3e32c8b940c10b14557271355f66ae4db1
change-id: 20250327-fix-mtk-iommu-error-af6ec347d057

Best regards,
--
Louis-Alexis Eyraud <louisalexis.eyraud@xxxxxxxxxxxxx>