Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning

From: Darren Hart
Date: Wed Apr 07 2010 - 23:25:25 EST

Thomas Gleixner wrote:
On Wed, 7 Apr 2010, Darren Hart wrote:
Thomas Gleixner wrote:
On Mon, 5 Apr 2010, Darren Hart wrote:
Hmm. The order is weird. Why don't you do that simpler ?

Get the uval, the tid and the thread_info pointer outside of the
loop. Also task_pid_vnr(current) just needs a one time lookup.
Eeek. Having the owner in the loop is a good way to negate the benefits
of adaptive spinning by spinning forever (unlikely, but it could
certainly spin across multiple owners). Nice catch.

As for the uval.... I'm not sure what you mean. You get curval below
inside the loop, and there is no "uval" in the my version of the code.

Well, you need a first time lookup of owner and ownertid for which you
need the user space value (uval),
but thinking more about it it's not
even necessary. Just initialize ownertid to 0 so it will drop into the
lookup code when we did not acquire the futex in the cmpxchg.

No need for ownertid at all really. The cmpxchg always tries to go from 0 to curtid. I've pushed the futex_owner() call outside the loop for a one time lookup.

As for the order, I had put the initial spin prior to the cmpxchg to
avoid doing too many cmpxchg's in a row as they are rather expensive.
However, since this is (now) the first opportunity to do try and acquire
the lock atomically after entering the futex syscall, I think you're
right, it should be the first thing in the loop.

change the loop to do:

for (;;) {
curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
if (!curval)
return 1;
Single return point makes instrumentation so much easier. Unless folks
_really_ object, I'll leave it as is until we're closer to merging.

I don't care either way. That was just example code.

if ((curval & FUTEX_TID_MASK) != ownertid) {
ownertid = curval & FUTEX_TID_MASK;
owner = update_owner(ownertid);

Hrm... at this point the owner has changed... so we should break and go
to sleep, not update the owner and start spinning again. The
futex_spin_on_owner() will detect this and abort, so I'm not seeing the
purpose of the above if() block.

Why ? If the owner has changed and the new owner is running on another
cpu then why not spin further ?

That's an interesting question, and I'm not sure what the right answer is. The current approach of the adaptive spinning in the kernel is to spin until the owner changes or deschedules, then stop and block. The idea is that if you didn't get the lock before the owner changed, you aren't going to get it in a very short period of time (you have at least an entire critical section to wait through plus whatever time you've already spent spinning). However, blocking just so another task can spin doesn't really make sense either, and makes the lock less fair than it could otherwise be.

My goal in starting this is to provide a more intelligent mechanism than sched_yield() for userspace to use to determine when to spin and when to sleep. The current implementation allows for spinning up until the owner changes, deschedules, or the timeslice expires. I believe these are much better than spinning for some fixed number of cycles and then yield for some unpredictable amount of time until CFS decides to schedule you back in.

Still, the criteria for breaking the spin are something that needs more eyes, and more numbers before I can be confident in any approach.

+ hrtimer_init_sleeper(to, current);
+ hrtimer_set_expires(&to->timer, *time);
+ }
Why setup all this _before_ trying the adaptive spin ?

I placed the retry: label above the adaptive spin loop. This way if we wake a
task and the lock is "stolen" it doesn't just go right back to sleep. This
should aid in fairness and also performance in less contended cases. I didn't
think it was worth a "if (first_time_through && time)" sort of block to be
able to setup the timer after the spin loop.

Hmm, ok.

Do we really need all this code ? A simple owner->on_cpu (owner needs
to be the task_struct then) would be sufficient to figure that out,
wouldn't it?
As Peter pointed out in IRC, p->oncpu isn't generic. I'll go trolling through
the mutex_spin_on_owner() discussions to see if I can determine why that's the

AFAICT p->oncpu is the correct thing to use when CONFIG_SMP=y. All it
needs is a simple accessor function and you can keep all the futex
cruft in futex.c where it belongs.


Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at