Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed

From: Oleg Nesterov
Date: Thu Mar 25 2010 - 11:48:31 EST


On 03/25, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> > On 03/24, Peter Zijlstra wrote:
> > >
> > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > > >
> > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > > which finally does s/WAKING/RUNNING.
> > > >
> > > > But. Nobody can take rq->lock in between. This means a signal
> > > > from irq (quite possible with CLONE_THREAD) or another rt
> > > > thread which preempts us can lockup.
> > >
> > > Hmm, the signal case might indeed be a problem, however I cannot see how
> > > the RT thread can be a problem because until we do wake_up_new_task()
> > > the child will not be runnable and can thus not be preempted.
> >
> > Indeed, but I meant the _parent_ can be preempted ;)
>
> I still can't see how that would be a problem..


The parent P does do_fork(), copy_process returns the new child C with
TASK_WAKING at PF_STARTING set.

do_fork() clears PF_STARTING, but TASK_WAKING is still set, and C is
already visible to the user-space

An rt-thread X preempts P and calls ttwu() (say, it sends a signal to C)

ttwu() loops in task_rq_lock() "forever", because TASK_WAKING is still
set.

> > > The reason we have that TASK_WAKING stuff for fork is because
> > > wake_up_new_task() needs p->cpus_allowed to be stable
> >
> > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> > a moment, but afaics that is all?
>
> My patch does that.

OK, that is what I meant. Now, why sched_fork() can't just set TASK_RUNNING ?
This way the "spurious" wakeup can do nothing with the new child, and we
do not have problems with cpuset which needs rq->lock for set_cpus_allowed().

> > > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > > and then puts all ->select_task_rq() calls under rq->lock.

Yes, I thought about this too. I tried to preserve the current
"->select_task_rq() is called without rq->lock" logic.

> This should
> > > allow us to remove the TASK_WAKING thing from fork

Confused. Why can't we simply remove TASK_WAKING from fork without any
changes except in wake_up_new_task() ?

> which in turn allows
> > > us to remove the PF_STARTING check in task_is_waking.

Even if I do not think I understand sched.c above, I'd say we must do
this in any case ;)

> I was still looking at removing the TASK_WAKING check from
> task_rq_lock()

Peter, I can't apply your patch due to rejects (will try again later)
but at first glance, it makes TASK_WAKING unneeded? Since we do not
drop the lock after we set TASK_WAKING, why do we need this state at
all ?

Aha... select_task_rq_fair() can drop the lock, yes? Well, in this
case probably select_task_rq_fair() can set TASK_WAKING before unlock?


I like the current idea to call select_task_rq() without rq->lock, but
of course this is up to you.

However, once again, can't we make a simpler patch?

- remove PF_STARTING from task_waking()

- change sched_fork() to set RUNNING instead of WAKING

- change wake_up_new_task() to set WAKING under rq->lock

This looks simpler to me, and allows to drop rq->lock in ttwu() right
after it sets WAKING.

Another change which seems reasonable is to allow ttwu() to take rq->lock
even if WAKING is set, it can do nothing but check task->state in this case.

What do you think?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/