Re: [PATCH v3 6/6] freezer,sched: Rewrite core freezer logic

From: Peter Zijlstra
Date: Thu Oct 27 2022 - 03:40:28 EST


On Thu, Oct 27, 2022 at 01:58:09PM +0800, Chen Yu wrote:

> > It's a very narrow race between schedule() and task_call_func().
> >
> > CPU0 CPU1
> >
> > __schedule()
> > rq_lock();
> > prev_state = READ_ONCE(prev->__state);
> > if (... && prev_state) {
> > deactivate_tasl(rq, prev, ...)
> > prev->on_rq = 0;
> >
> > task_call_func()
> > raw_spin_lock_irqsave(p->pi_lock);
> > state = READ_ONCE(p->__state);
> > smp_rmb();
> > if (... || p->on_rq) // false!!!
> > rq = __task_rq_lock()
> >
> > ret = func();
> >
> > next = pick_next_task();
> > rq = context_switch(prev, next)
> > prepare_lock_switch()
> > spin_release(&__rq_lockp(rq)->dep_map...)
> >
> >
> >
> > So while the task is on it's way out, it still holds rq->lock for a
> > little while, and right then task_call_func() comes in and figures it
> > doesn't need rq->lock anymore (because the task is already dequeued --
> > but still running there) and then the __set_task_frozen() thing observes
> > it's holding rq->lock and yells murder.
> >
> > Could you please give the below a spin?
> >
> > ---
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index cb2aa2b54c7a..f519f44cd4c7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4200,6 +4200,37 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > return success;
> > }
> >
> > +static bool __task_needs_rq_lock(struct task_struct *p)
> > +{
> > + unsigned int state = READ_ONCE(p->__state);
> > +
> > + /*
> > + * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> > + * the task is blocked. Make sure to check @state since ttwu() can drop
> > + * locks at the end, see ttwu_queue_wakelist().
> > + */
> > + if (state == TASK_RUNNING || state == TASK_WAKING)
> > + return true;
> > +
> > + /*
> > + * Ensure we load p->on_rq after p->__state, otherwise it would be
> > + * possible to, falsely, observe p->on_rq == 0.
> > + *
> > + * See try_to_wake_up() for a longer comment.
> > + */
> > + smp_rmb();
> > + if (p->on_rq)
> > + return true;
> > +
> > +#ifdef CONFIG_SMP
> > + smp_rmb();
> > + if (p->on_cpu)
> > + return true;
> > +#endif
> Should we also add p->on_cpu check to return 0 in __set_task_frozen()?
> Otherwise it might still warn that p is holding the lock?

With this, I don't think __set_task_frozen() should ever see
'p->on_cpu && !p->on_rq'. By forcing task_call_func() to acquire
rq->lock that window is closed. That is, this window only exits in
__schedule() while it holds rq->lock, since we're now serializing
against that, we should no longer observe it.