Re: [PATCH v2 4/4] freezer,sched: Rewrite core freezer logic

From: Peter Zijlstra
Date: Thu Aug 05 2021 - 07:50:57 EST


On Wed, Jul 07, 2021 at 04:14:12PM +0200, Oleg Nesterov wrote:
> sorry for delay...

And me.. :/

> I am still trying to understand this series, just one note for now.

The main motivation is to ensure tasks don't wake up early on resume.
The current code has a problem between clearing pm_freezing and calling
__thaw_task(), a task can get spuriously woken there.

(Will is doing unspeakable things that suffer there.)

I'm trying to fix that by making frozen a special wait state, but that
then gets me complications vs the existing special states.

I also don't want to change the wakeup path, as you suggested earlier
because that's adding code (abeit fairly trivial) to every single wakeup
for the benefit of these exceptional cases, which I feel is just wrong
(tempting as it might be).

> On 06/24, Peter Zijlstra wrote:
> >
> > +static bool __freeze_task(struct task_struct *p)
> > +{
> > + unsigned long flags;
> > + unsigned int state;
> > + bool frozen = false;
> > +
> > + raw_spin_lock_irqsave(&p->pi_lock, flags);
> > + state = READ_ONCE(p->__state);
> > + if (state & (TASK_FREEZABLE|__TASK_STOPPED|__TASK_TRACED)) {
> > + /*
> > + * Only TASK_NORMAL can be augmented with TASK_FREEZABLE,
> > + * since they can suffer spurious wakeups.
> > + */
> > + if (state & TASK_FREEZABLE)
> > + WARN_ON_ONCE(!(state & TASK_NORMAL));
> > +
> > +#ifdef CONFIG_LOCKDEP
> > + /*
> > + * It's dangerous to freeze with locks held; there be dragons there.
> > + */
> > + if (!(state & __TASK_FREEZABLE_UNSAFE))
> > + WARN_ON_ONCE(debug_locks && p->lockdep_depth);
> > +#endif
> > +
> > + if (state & (__TASK_STOPPED|__TASK_TRACED))
> > + WRITE_ONCE(p->__state, TASK_FROZEN|__TASK_FROZEN_SPECIAL);
>
> Well, this doesn't look right.

> But the main problem is that you can't simply remove __TASK_TRACED,
> this can confuse the debugger, any ptrace() request will fail as if
> the tracee was killed.

Urgh.. indeed. I missed the obvious *again* :/ Other, not-yet-frozen,
tasks will observe this 'intermediate' state and misbehave. And similar
on wakeup I suppose, if we wake the ptracer before the tracee it again
can observe this state.

I suppose we could cure that, have stopped/trace users use a special
accessor for task::__state... not pretty. Let me see if I can come up
with anything else.