Re: [PATCH] sched: set TIF_NEED_RESCHED before calling __trace_set_need_resched()

From: Peter Zijlstra

Date: Thu Jun 25 2026 - 05:20:12 EST


On Thu, Jun 25, 2026 at 06:54:19AM +0000, Sechang Lim wrote:
> set_tsk_need_resched() tests TIF_NEED_RESCHED, calls
> __trace_set_need_resched() if the flag is clear, then sets it via
> set_tsk_thread_flag(). A BPF raw_tp program attached to
> sched_set_need_resched executes synchronously inside __bpf_trace_run().
> On return, __bpf_trace_run() drops the RCU lock with
> rcu_read_unlock_migrate(), which on the preempt-or-BH-disabled path
> calls set_need_resched_current() -> set_tsk_need_resched() again.
>
> set_tsk_thread_flag() follows the tracepoint call, so every re-entrant
> frame sees TIF_NEED_RESCHED clear and calls __trace_set_need_resched()
> again:
>
> BUG: TASK stack guard page was hit at ffffc9001224ff98
> Oops: stack guard page: 0000 [#1] SMP KASAN PTI
> RIP: 0010:__bpf_trace_sched_set_need_resched_tp+0x1c/0x190
> Call Trace:
> trace_sched_set_need_resched_tp+0x110/0x130
> set_tsk_need_resched include/linux/sched.h:2076
> set_need_resched_current include/linux/sched.h:2094
> rcu_read_unlock_special+0x43a/0x440
> __rcu_read_unlock+0x9e/0x120
> rcu_read_unlock_migrate+0xa9/0x240
> __bpf_trace_run+0x131/0x180
> bpf_trace_run3+0x333/0x430
> __bpf_trace_sched_set_need_resched_tp+0x13a/0x190
> trace_sched_set_need_resched_tp+0x110/0x130
> set_tsk_need_resched include/linux/sched.h:2076
> ...
>
> Replace the separate test_tsk_thread_flag() + set_tsk_thread_flag() pair
> with test_and_set_tsk_thread_flag().
>
> Fixes: adcc3bfa8806 ("sched: Adapt sched tracepoints for RV task model")
> Signed-off-by: Sechang Lim <rhkrqnwk98@xxxxxxxxx>
> ---
> include/linux/sched.h | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ee06cba5c6f5..c9efd08dae92 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2071,10 +2071,9 @@ static inline int test_tsk_thread_flag(struct task_struct *tsk, int flag)
>
> static inline void set_tsk_need_resched(struct task_struct *tsk)
> {
> - if (tracepoint_enabled(sched_set_need_resched_tp) &&
> - !test_tsk_thread_flag(tsk, TIF_NEED_RESCHED))
> + if (!test_and_set_tsk_thread_flag(tsk, TIF_NEED_RESCHED) &&
> + tracepoint_enabled(sched_set_need_resched_tp))
> __trace_set_need_resched(tsk, TIF_NEED_RESCHED);
> - set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);
> }

__resched_curr() does the same, no?

Also, did you check if the RV model perhaps relies on the TIF bit not
being set when the tracepoint is tripped?