Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX

From: Robin Murphy

Date: Wed Apr 08 2026 - 12:46:43 EST

On 2026-04-08 5:16 pm, Russell King (Oracle) wrote:

On Wed, Apr 08, 2026 at 05:08:34PM +0100, Russell King (Oracle) wrote:

The rebase is still progressing, but it's landed on:

c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction

FWIW I don't see a Tegra having the Xilinx IP in it anyway - judging by the DT it has their own tegra-gpcdma engine...

There's a fair chance this could be 90c5def10bea ("iommu: Do not call drivers for empty gathers"), which JonH also reported causing boot issues on Tegras - in short, SMMU TLB maintenance may not be completed properly which could lead to recycled DMA addresses causing exactly this kind of random memory corruption. I CC'd you on a patch:

https://lore.kernel.org/linux-iommu/20260408162846.GE3357077@xxxxxxxxxx/T/#t

Thanks,
Robin.

and while this boots to a login prompt, it spat out a BUG():

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 56, name: kworker/u24:3
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by kworker/u24:3/56:
#0: ffff000080042148 ((wq_completion)events_unbound#2){+.+.}-{0:0}, at: process_one_work+0x184/0x780
#1: ffff80008299bdf8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1ac/0x780
#2: ffff0000808b48f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x2c/0x188
irq event stamp: 10872
hardirqs last enabled at (10871): [<ffff80008013a410>] ktime_get+0x130/0x180
hardirqs last disabled at (10872): [<ffff800080d61ac8>] _raw_spin_lock_irqsave+0x84/0x88
softirqs last enabled at (9216): [<ffff80008002807c>] fpsimd_save_and_flush_current_state+0x3c/0x80
softirqs last disabled at (9214): [<ffff800080028098>] fpsimd_save_and_flush_current_state+0x58/0x80
CPU: 5 UID: 0 PID: 56 Comm: kworker/u24:3 Not tainted 7.0.0-rc1-bisect+ #654 PREEMPT
Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
Workqueue: events_unbound deferred_probe_work_func
Call trace:
show_stack+0x18/0x30 (C)
dump_stack_lvl+0x6c/0x94
dump_stack+0x18/0x24
__might_resched+0x154/0x220
__might_sleep+0x48/0x80
__mutex_lock+0x48/0x800
mutex_lock_nested+0x24/0x30
pinmux_disable_setting+0x9c/0x180
pinctrl_commit_state+0x5c/0x260
pinctrl_pm_select_idle_state+0x4c/0xa0
tegra_i2c_runtime_suspend+0x2c/0x3c
pm_generic_runtime_suspend+0x2c/0x44
__rpm_callback+0x48/0x1ec
rpm_callback+0x74/0x80
rpm_suspend+0xec/0x630
rpm_idle+0x2c0/0x420
__pm_runtime_idle+0x44/0x160
tegra_i2c_probe+0x2e4/0x640
platform_probe+0x5c/0xa4
really_probe+0xbc/0x2c0
__driver_probe_device+0x78/0x120
driver_probe_device+0x3c/0x160
__device_attach_driver+0xbc/0x160
bus_for_each_drv+0x70/0xb8
__device_attach+0xa4/0x188
device_initial_probe+0x50/0x54
bus_probe_device+0x38/0xa4
deferred_probe_work_func+0x90/0xcc
process_one_work+0x204/0x780
worker_thread+0x1c8/0x36c
kthread+0x138/0x144
ret_from_fork+0x10/0x20

This is reproducible.

I've just realised that it's the Tegra I2C bug that is already known
about, but took ages to be fixed in mainline - it's unrelated to the
memory corruption, so can be ignored. Sorry for the noise.