BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT
From: Luis Claudio R. Goncalves
Date: Fri Feb 07 2025 - 09:23:16 EST
Hello!
While running ssdd[1] from rt-tests on an aarch64 kernel with PREEMPT_RT and
debug features enabled, this bug was triggered on every single run:
[1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/tree/src/ssdd/ssdd.c
# ssdd
[ 273.115597] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 273.115607] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6077, name: ssdd
[ 273.115611] preempt_count: 1, expected: 0
[ 273.115614] RCU nest depth: 0, expected: 0
[ 273.115617] 1 lock held by ssdd/6077:
[ 273.115620] #0: ffff07ffd77893e0 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x58/0x200
[ 273.115642] Preemption disabled at:
[ 273.115644] [<ffffad24518bd1f4>] debug_exception_enter+0x1c/0x80
[ 273.115653] CPU: 47 UID: 0 PID: 6077 Comm: ssdd Not tainted 6.13.0-rt3 #1 PREEMPT_RT
[ 273.115659] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022
[ 273.115662] Call trace:
[ 273.115664] show_stack+0x34/0x98 (C)
[ 273.115670] dump_stack_lvl+0xa8/0xe8
[ 273.115675] dump_stack+0x1c/0x38
[ 273.115680] __might_resched+0x254/0x330
[ 273.115686] rt_spin_lock+0xcc/0x220
[ 273.115692] force_sig_info_to_task+0x58/0x200
[ 273.115697] force_sig_fault+0xd0/0x120
[ 273.115702] arm64_force_sig_fault+0x48/0x80
[ 273.115707] send_user_sigtrap+0x88/0xe8
[ 273.115712] single_step_handler+0x100/0x160
[ 273.115717] do_debug_exception+0x94/0x160
[ 273.115722] el0_dbg+0x54/0x150
[ 273.115727] el0t_64_sync_handler+0x134/0x138
[ 273.115732] el0t_64_sync+0x1ac/0x1b0
The ptrace usage in ssdd eventually exercises the code path that starts on
el0t_64_sync_handler() and may end up calling do_debug_exception(), which
calls debug_exception_enter() that disables preemption.
Looking at the backtrace, later in the call chain force_sig_info_to_task()
tries to take a spinlock, which on PREEMPT_RT becomes a rtmutex and could
sleep in case of contention. That triggers the "BUG: sleeping function
called from invalid context" warning.
It is also possible to reproduce the problem in an aarch64 kernel with
PREEMPT_RT enabled, no extra debug features, by running ssdd in a loop.
With that we can see not only the backtrace reported above but also other
instances where the process is scheduled out while preemption is disabled:
# while :; do ssdd; done
[ 754.673678] BUG: scheduling while atomic: ssdd/7340/0x00000002
[ 754.673682] Modules linked in: qrtr rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_dmc620_pmu arm_cmn cppc_cpufreq arm_dsu_pmu loop dm_multipath nfnetlink xfs nvme ghash_ce sha2_ce sha256_arm64 nvme_core sha1_ce nvme_auth sbsa_gwdt ast i2c_algo_bit i2c_designware_platform xgene_hwmon i2c_designware_core dm_mirror dm_region_hash dm_log dm_mod fuse
[ 754.673703] Preemption disabled at:
[ 754.673703] [<ffffa87a17ca470c>] do_debug_exception+0x54/0x100
[ 754.673710] CPU: 102 UID: 0 PID: 7340 Comm: ssdd Kdump: loaded Not tainted 6.14.0-rc1 #1
[ 754.673712] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022
[ 754.673713] Call trace:
[ 754.673714] show_stack+0x34/0x98 (C)
[ 754.673718] dump_stack_lvl+0x80/0xa8
[ 754.673721] dump_stack+0x18/0x2c
[ 754.673722] __schedule_bug+0x90/0xc0
[ 754.673726] schedule_debug.isra.0+0x128/0x158
[ 754.673728] __schedule+0x68/0x690
[ 754.673731] schedule_rtlock+0x24/0x50
[ 754.673733] rtlock_slowlock_locked+0x1c0/0x350
[ 754.673735] rt_spin_lock+0xcc/0x130
[ 754.673737] obj_cgroup_charge+0x54/0x138
[ 754.673740] __memcg_slab_post_alloc_hook+0xcc/0x300
[ 754.673743] kmem_cache_alloc_noprof+0x304/0x338
[ 754.673745] __send_signal_locked+0x90/0x428
[ 754.673748] send_signal_locked+0xe4/0x140
[ 754.673750] force_sig_info_to_task+0xd0/0x160
[ 754.673753] force_sig_fault+0x6c/0xa8
[ 754.673755] arm64_force_sig_fault+0x48/0x80
[ 754.673757] send_user_sigtrap+0x54/0xd0
[ 754.673759] single_step_handler+0xc4/0xe0
[ 754.673761] do_debug_exception+0x7c/0x100
[ 754.673762] el0_dbg+0x40/0x158
[ 754.673766] el0t_64_sync_handler+0x134/0x138
[ 754.673768] el0t_64_sync+0x1ac/0x1b0
In this case one of the local_lock_* calls in (the functions called by)
obj_cgroup_charge() seems to hit contention and, as it is dealing with
rtmutexes, be effectively scheduled out to sleep.
The scary comment on top of debug_exception_enter() provides a reason for
preemption being disabled at that point, but it seems to open a can of worms
for PREEMPT_RT usage:
/*
* In debug exception context, we explicitly disable preemption despite
* having interrupts disabled.
* This serves two purposes: it makes it much less likely that we would
* accidentally schedule in exception context and it will force a warning
* if we somehow manage to schedule by accident.
*/
This is the data I gathered so far, using both v6.13.0-rt3 and 6.14.0-rc1
for testing. But due to my ignorance wrt the debug exception treatment in
aarch64 I can't devise a solution for the observed behavior.
Any suggestions or comments?
Best regards,
Luis