Re: [PATCH] sched: Add _TIF_NEED_RESCHED_LAZY to __resched_curr check

From: K Prateek Nayak

Date: Mon Sep 29 2025 - 00:37:43 EST


Hello Jemmy,

On 9/28/2025 8:44 PM, Jemmy Wong wrote:
> The TIF_NEED_RESCHED_LAZY flag can be set multiple times in a single
> call path. For example:
>
> entity_tick()
> update_curr(cfs_rq);
> resched_curr_lazy(rq);
> resched_curr_lazy(rq_of(cfs_rq));
>
> Add a check in resched_curr_lazy() to return early if the flag is
> already set, avoiding redundant operations.

That would have been a decent idea but then you decided to put that
check in __resched_curr() which makes it plain wrong.

[..snip..]

> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -67,6 +67,8 @@ enum syscall_work_bit {
> #define _TIF_NEED_RESCHED_LAZY _TIF_NEED_RESCHED
> #endif
>
> +#define _TIF_NEED_RESCHED_MUSK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)

s/MUSK/MASK/g

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1108,7 +1108,7 @@ static void __resched_curr(struct rq *rq, int tif)
> if (is_idle_task(curr) && tif == TIF_NEED_RESCHED_LAZY)
> tif = TIF_NEED_RESCHED;
>
> - if (cti->flags & ((1 << tif) | _TIF_NEED_RESCHED))
> + if (cti->flags & ((1 << tif) | _TIF_NEED_RESCHED_MUSK))
> return;

__resched_curr() is used to set both TIF_NEED_RESCHED_LAZY and
TIF_NEED_RESCHED.

By putting this check here, any effort to set NEED_RESCHED and force an
early preemption will bail out if NEED_RESCHED_LAZY is already set which
will delay the preemption.

An example:

/* New fair task wakes up. */
check_preempt_wakeup_fair()
resched_curr_lazy()
__resched_curr(TIF_NEED_RESCHED_LAZY)

/* New RT task wakes up. */
wakeup_preempt()
resched_curr()
__resched_curr(TIF_NEED_RESCHED)
/* Sees NEED_RESCHED_LAZY is already set. */
/* Does not do a set_preempt_need_resched() */

... /* Added latency */
sched_tick()
if (tif_test_bit(TIF_NEED_RESCHED_LAZY))
resched_curr()
__resched_curr(TIF_NEED_RESCHED)
/* Again bails out early! */

... /* More latency! */


So, the tick doesn't even upgrade the LAZY flag to a full NEED_RESCHED
and the only time you actually schedule is either at exit to user mode
or if a kthread decides to yield.

Going back to your commit message, something like:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f1e5cb94c53..3275abce9ca2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1164,6 +1164,9 @@ static __always_inline int get_lazy_tif_bit(void)

void resched_curr_lazy(struct rq *rq)
{
+ if (task_thread_info(rq->curr)->flags & TIF_NEED_RESCHED_MASK)
+ return;
+
__resched_curr(rq, get_lazy_tif_bit());
}

probably fits the bill better.

>
> cpu = cpu_of(rq);
> --
> 2.50.1 (Apple Git-155)
>

--
Thanks and Regards,
Prateek