Re: [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during
From: Mark Rutland
Date: Mon Mar 02 2026 - 06:28:08 EST
On Mon, Mar 02, 2026 at 04:23:46PM +0530, Khaja Hussain Shaik Khaji wrote:
> This patch fixes a kprobes failure observed due to lost current_kprobe
> on arm64 during kretprobe entry handling under interrupt load.
>
> v1 attempted to address this by simulating BTI instructions as NOPs and
> v2 attempted to address this by disabling preemption across the
> out-of-line (XOL) execution window. Further analysis showed that this
> hypothesis was incorrect: the failure is not caused by scheduling or
> preemption during XOL.
>
> The actual root cause is re-entrant invocation of kprobe_busy_begin()
> from an active kprobe context. On arm64, IRQs are re-enabled before
> invoking kprobe handlers, allowing an interrupt during kretprobe
> entry_handler to trigger kprobe_flush_task(), which calls
> kprobe_busy_begin/end and corrupts current_kprobe and kprobe_status.
>
> [ 2280.630526] Call trace:
> [ 2280.633044] dump_backtrace+0x104/0x14c
> [ 2280.636985] show_stack+0x20/0x30
> [ 2280.640390] dump_stack_lvl+0x58/0x74
> [ 2280.644154] dump_stack+0x20/0x30
> [ 2280.647562] kprobe_busy_begin+0xec/0xf0
> [ 2280.651593] kprobe_flush_task+0x2c/0x60
> [ 2280.655624] delayed_put_task_struct+0x2c/0x124
> [ 2280.660282] rcu_core+0x56c/0x984
> [ 2280.663695] rcu_core_si+0x18/0x28
> [ 2280.667189] handle_softirqs+0x160/0x30c
> [ 2280.671220] __do_softirq+0x1c/0x2c
> [ 2280.674807] ____do_softirq+0x18/0x28
> [ 2280.678569] call_on_irq_stack+0x48/0x88
> [ 2280.682599] do_softirq_own_stack+0x24/0x34
> [ 2280.686900] irq_exit_rcu+0x5c/0xbc
> [ 2280.690489] el1_interrupt+0x40/0x60
> [ 2280.694167] el1h_64_irq_handler+0x20/0x30
> [ 2280.698372] el1h_64_irq+0x64/0x68
> [ 2280.701872] _raw_spin_unlock_irq+0x14/0x54
> [ 2280.706173] dwc3_msm_notify_event+0x6e8/0xbe8
> [ 2280.710743] entry_dwc3_gadget_pullup+0x3c/0x6c
> [ 2280.715393] pre_handler_kretprobe+0x1cc/0x304
> [ 2280.719956] kprobe_breakpoint_handler+0x1b0/0x388
> [ 2280.724878] brk_handler+0x8c/0x128
> [ 2280.728464] do_debug_exception+0x94/0x120
> [ 2280.732670] el1_dbg+0x60/0x7c
The el1_dbg() function was removed in commit:
31575e11ecf7 ("arm64: debug: split brk64 exception entry")
... which was merged in v6.17.
Are you able to reproduce the issue with v6.17 or later?
Which specific kernel version did you see this with?
The arm64 entry code has changed substantially in recent months (fixing
a bunch of latent issues), and we need to know which specific version
you're looking at. It's possible that your issue has already been fixed.
Mark.
> [ 2280.735815] el1h_64_sync_handler+0x48/0xb8
> [ 2280.740114] el1h_64_sync+0x64/0x68
> [ 2280.743701] dwc3_gadget_pullup+0x0/0x124
> [ 2280.747827] soft_connect_store+0xb4/0x15c
> [ 2280.752031] dev_attr_store+0x20/0x38
> [ 2280.755798] sysfs_kf_write+0x44/0x5c
> [ 2280.759564] kernfs_fop_write_iter+0xf4/0x198
> [ 2280.764033] vfs_write+0x1d0/0x2b0
> [ 2280.767529] ksys_write+0x80/0xf0
> [ 2280.770940] __arm64_sys_write+0x24/0x34
> [ 2280.774974] invoke_syscall+0x54/0x118
> [ 2280.778822] el0_svc_common+0xb4/0xe8
> [ 2280.782587] do_el0_svc+0x24/0x34
> [ 2280.785999] el0_svc+0x40/0xa4
> [ 2280.789140] el0t_64_sync_handler+0x8c/0x108
> [ 2280.793526] el0t_64_sync+0x198/0x19c