Re: [PATCH] sched: correct testing need_resched inmutex_spin_on_owner()

From: Steven Rostedt
Date: Tue Jun 07 2011 - 11:08:25 EST


On Tue, Jun 07, 2011 at 10:47:21PM +0800, Hillf Danton wrote:
> On Tue, Jun 7, 2011 at 10:40 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Tue, 2011-06-07 at 22:36 +0800, Hillf Danton wrote:
> >
> >> If you are right, the following comment also in __mutex_lock_common()
> >>
> >>       for (;;) {
> >>               struct task_struct *owner;
> >>
> >>               /*
> >>                * If there's an owner, wait for it to either
> >>                * release the lock or go to sleep.
> >>                */
> >>               owner = ACCESS_ONCE(lock->owner);
> >>               if (owner && !mutex_spin_on_owner(lock, owner))
> >>                       break;
> >>
> >> looks misleading too, but if owner is on this CPU, for what does we wait?
> >
> > huh, wtf!? it cannot be on this cpu, if it was we wouldn't be running
> > the above code but whatever owner was doing.
> >
> > So my argument was, it cannot be on this cpu, therefore, by checking it
> > is on a cpu, we check its on a different cpu.
> >
> > And I really don't see how any of that is related to the above.
> >
> Oh, it looks you are willing to rethink about testing need_resched?

NO!

Hillf, reading this thread I understand that you have no idea of what is
happening here. You are totally clueless. Let me give you a clue.

The idea of an adaptive mutex is quite simple. If a mutex is held for a
short period of time, and on another CPU a task tries to grab the same
mutex, it would be slower to have that task schedule out than to just
spin waiting for that mutex to be released.

Ideally, this mutex would then act like a spin lock, and things would
run very efficiently. But, as it is not a spin lock, there's things we
must also be aware of.

1) If the owner we are waiting for to release the mutex schedules out.

2) If another task wants us to run on the CPU of the waiter that is
spinning.

3) If the waiter that is spinning runs out of its time quantum and
should yield to another task.

#1 is checked by the owner_running(). If that fails, then we should not
spin anymore and its good to schedule. BTW, the answer why
owner_running() is safe to check owner, is because it grabs rcu_locks()
to make sure it is safe.

#2 and #3 are checked by testing NEED_RESCHED on the waiter (NOT THE
OWNER!). If the waiter that is spinning should be preempted, the waiter
(not the owner) will have its NEED_RESCHED bit set. That is why the
waiter's NEED_RESCHED flag is checked and not owners.

And also we don't need to check need_resched() in the above code because
the loop checks it anyway, or we break out of the loop.

Understand?

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/