Re: [tip:sched/urgent] sched: Fix rq->nr_uninterruptible update race

From: Peter Zijlstra
Date: Fri Jan 27 2012 - 03:20:01 EST


On Fri, 2012-01-27 at 11:20 +0600, Rakib Mullick wrote:
> On Fri, Jan 27, 2012 at 2:25 AM, tip-bot for Peter Zijlstra
> <a.p.zijlstra@xxxxxxxxx> wrote:
> > Commit-ID: 4ca9b72b71f10147bd21969c1805f5b2c4ca7b7b
> > Gitweb: http://git.kernel.org/tip/4ca9b72b71f10147bd21969c1805f5b2c4ca7b7b
> > Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > AuthorDate: Wed, 25 Jan 2012 11:50:51 +0100
> > Committer: Ingo Molnar <mingo@xxxxxxx>
> > CommitDate: Thu, 26 Jan 2012 19:38:09 +0100
> >
> > sched: Fix rq->nr_uninterruptible update race
> >
> > KOSAKI Motohiro noticed the following race:
> >
> > > CPU0 CPU1
> > > --------------------------------------------------------
> > > deactivate_task()
> > > task->state = TASK_UNINTERRUPTIBLE;
> > > activate_task()
> > > rq->nr_uninterruptible--;
> > >
> > > schedule()
> > > deactivate_task()
> > > rq->nr_uninterruptible++;
> > >
> >
> > Kosaki-San's scenario is possible when CPU0 runs
> > __sched_setscheduler() against CPU1's current @task.
> >
> > __sched_setscheduler() does a dequeue/enqueue in order to move
> > the task to its new queue (position) to reflect the newly provided
> > scheduling parameters. However it should be completely invariant to
> > nr_uninterruptible accounting, sched_setscheduler() doesn't affect
> > readyness to run, merely policy on when to run.
> >
> > So convert the inappropriate activate/deactivate_task usage to
> > enqueue/dequeue_task, which avoids the nr_uninterruptible accounting.
> >
> Why would we want to avoid nr_uninterruptible accounting?
> nr_uninterruptible has impact on load calculation, we might not get
> the proper load weight if we don't account it. isn't it?

Read again ;-)

sched_setscheduler() did:

deactivate_task(); // remove it from the queue

// change tasks's scheduler paramater

activate_task(); // queue it in the new place

it is invariant wrt nr_uninterruptible but does include the
nr_uinterruptile accounting logic.

Now Kosaki-San noticed that if the task manages to change its ->state at
an inopportune moment (right between the dequeue and enqueue) we'll get
screwy nr_uninterruptible accounting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/