Re: [PATCH] sched/fair: Clear rel_deadline when initializing forked entities

From: K Prateek Nayak

Date: Mon Apr 27 2026 - 02:46:59 EST


Hello Zicheng,

On 4/24/2026 12:41 PM, Zicheng Qu wrote:
> A yield-triggered crash can happen when a newly forked sched_entity
> enters the fair class with se->rel_deadline unexpectedly set.
>
> The failing sequence is:
>
> 1. A task is forked while se->rel_deadline is still set.

I think a bit more information on how this is happens would be nice:

"rel_deadline" is meant to be an internal indicator to during
reweight and migration but a reweight of parent from a remote
CPU can race with a fork() where the child inherits the
"rel_deadline" during copy_process() since fork() does not grab
the pi_lock of parent.

On a side note, should we grab the pi_lock when inheriting the
sched attributes of the parent?

I don't think it is strictly necessary since we reconstruct the state
but it does seem racy to my eyes against a setscheduler on parent
unless I'm missing something.

> 2. __sched_fork() initializes vruntime, vlag and other sched_entity
> state, but does not clear rel_deadline.
> 3. On the first enqueue, enqueue_entity() calls place_entity().
> 4. Because se->rel_deadline is set, place_entity() treats se->deadline
> as a relative deadline and converts it to an absolute deadline by
> adding the current vruntime.
> 5. However, the forked entity's deadline is not a valid inherited
> relative deadline for this new scheduling instance, so the conversion
> produces an abnormally large deadline.
> 6. If the task later calls sched_yield(), yield_task_fair() advances
> se->vruntime to se->deadline.
> 7. The inflated vruntime is then used by the following enqueue path,
> where the vruntime-derived key can overflow when multiplied by the
> entity weight.
> 8. This corrupts cfs_rq->sum_w_vruntime, breaks EEVDF eligibility
> calculation, and can eventually make all entities appear ineligible.
> pick_next_entity() may then return NULL unexpectedly, leading to a
> later NULL dereference.
>
> A captured trace shows the effect clearly. Before yield, the entity's
> vruntime was around:
>
> 9834017729983308
>
> After yield_task_fair() executed:
>
> se->vruntime = se->deadline
>
> the vruntime jumped to:
>
> 19668035460670230
>
> and the deadline was later advanced further to:
>
> 19668035463470230
>
> This shows that the deadline had already become abnormally large before
> yield_task_fair() copied it into vruntime.

Although I can hit this very easily, I haven't yet been able to crash a
system from this, or see the vruntime drift apart when stressing - in my
case, the deadline seems to pretty tame for most part but that is
probably because I don't have the weights right and it is a probability
game.

Either ways, the fix does make sense to me.

>
> rel_deadline is only meaningful when se->deadline really carries a
> relative deadline that still needs to be placed against vruntime. A
> freshly forked sched_entity should not inherit or retain this state.
> Clear se->rel_deadline in __sched_fork(), together with the other
> sched_entity runtime state, so that the first enqueue does not interpret
> the new entity's deadline as a stale relative deadline.
>
> Fixes: 82e9d0456e06 ("sched/fair: Avoid re-setting virtual deadline on 'migrations'")
> Analyzed-by: Hui Tang <tanghui20@xxxxxxxxxx>
> Analyzed-by: Zhang Qiao <zhangqiao22@xxxxxxxxxx>
> Signed-off-by: Zicheng Qu <quzicheng@xxxxxxxxxx>

Feel free to include:

Reviewed-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

> ---
> kernel/sched/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da20fb6ea25a..b8871449d3c6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4458,6 +4458,7 @@ static void __sched_fork(u64 clone_flags, struct task_struct *p)
> p->se.nr_migrations = 0;
> p->se.vruntime = 0;
> p->se.vlag = 0;
> + p->se.rel_deadline = 0;
> INIT_LIST_HEAD(&p->se.group_node);
>
> /* A delayed task cannot be in clone(). */

--
Thanks and Regards,
Prateek