Re: [PATCH] sched: set TIF_NEED_RESCHED before calling __trace_set_need_resched()

From: Gabriele Monaco

Date: Thu Jun 25 2026 - 05:43:58 EST


On Thu, 2026-06-25 at 11:16 +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2026 at 06:54:19AM +0000, Sechang Lim wrote:
> > set_tsk_need_resched() tests TIF_NEED_RESCHED, calls
> > __trace_set_need_resched() if the flag is clear, then sets it via
> > set_tsk_thread_flag().  A BPF raw_tp program attached to
> > sched_set_need_resched executes synchronously inside
> > __bpf_trace_run().
> > On return, __bpf_trace_run() drops the RCU lock with
> > rcu_read_unlock_migrate(), which on the preempt-or-BH-disabled path
> > calls set_need_resched_current() -> set_tsk_need_resched() again.
> >
> > set_tsk_thread_flag() follows the tracepoint call, so every re-
> > entrant
> > frame sees TIF_NEED_RESCHED clear and calls
> > __trace_set_need_resched()
> > again:
> >
> >   BUG: TASK stack guard page was hit at ffffc9001224ff98
> >   Oops: stack guard page: 0000 [#1] SMP KASAN PTI
> >   RIP: 0010:__bpf_trace_sched_set_need_resched_tp+0x1c/0x190
> >   Call Trace:
> >    trace_sched_set_need_resched_tp+0x110/0x130
> >    set_tsk_need_resched include/linux/sched.h:2076
> >    set_need_resched_current include/linux/sched.h:2094
> >    rcu_read_unlock_special+0x43a/0x440
> >    __rcu_read_unlock+0x9e/0x120
> >    rcu_read_unlock_migrate+0xa9/0x240
> >    __bpf_trace_run+0x131/0x180
> >    bpf_trace_run3+0x333/0x430
> >    __bpf_trace_sched_set_need_resched_tp+0x13a/0x190
> >    trace_sched_set_need_resched_tp+0x110/0x130
> >    set_tsk_need_resched include/linux/sched.h:2076
> >    ...
> >
> > Replace the separate test_tsk_thread_flag() + set_tsk_thread_flag()
> > pair
> > with test_and_set_tsk_thread_flag().
> >
> > Fixes: adcc3bfa8806 ("sched: Adapt sched tracepoints for RV task
> > model")
> > Signed-off-by: Sechang Lim <rhkrqnwk98@xxxxxxxxx>
> > ---
> >  include/linux/sched.h | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index ee06cba5c6f5..c9efd08dae92 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2071,10 +2071,9 @@ static inline int
> > test_tsk_thread_flag(struct task_struct *tsk, int flag)
> >  
> >  static inline void set_tsk_need_resched(struct task_struct *tsk)
> >  {
> > - if (tracepoint_enabled(sched_set_need_resched_tp) &&
> > -     !test_tsk_thread_flag(tsk, TIF_NEED_RESCHED))
> > + if (!test_and_set_tsk_thread_flag(tsk, TIF_NEED_RESCHED)
> > &&
> > +     tracepoint_enabled(sched_set_need_resched_tp))
> >   __trace_set_need_resched(tsk, TIF_NEED_RESCHED);
> > - set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);
> >  }
>
> __resched_curr() does the same, no?

I think it wouldn't make the same re-entrant call but it's probably a
good idea to make it consistent and have the flag set before the
tracepoint every time.

> Also, did you check if the RV model perhaps relies on the TIF bit not
> being set when the tracepoint is tripped?

It does not, we just check the tif passed to the tracepoint and not the
one in the task, the assumption here is that it is set during that
exact tracepoint (before or after doesn't matter, but it would matter
if it was set on a previous event).
Depending on what handlers do, this may become a problem for RV as
well.

Thanks,
Gabriele