Re: [PATCH] futex: add FUTEX_SET_WAIT operation

From: Darren Hart
Date: Mon Nov 30 2009 - 17:10:02 EST


Peter Zijlstra wrote:

Hey Peter,

Some thoughts on adaptive futexes ...

Subject: futex: implement adaptive spin
From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Date: Tue Jan 20 14:40:36 CET 2009

Implement kernel side adaptive spining for futexes.

This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
requires the futex lock field to contain a TID and regular futexes do
not have that restriction.

FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a different cpu) or until the task gets preempted. After that it behaves
like FUTEX_WAIT.

The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
on the lower 30 bits (TID_MASK) of the lock field -- leaving the
WAITERS and OWNER_DIED flags in tact.

NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
adding a lock_owner function argument.

void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
void *lock, struct thread_info *owner);

Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
...

Index: linux-2.6/kernel/futex.c
===================================================================
--- linux-2.6.orig/kernel/futex.c
+++ linux-2.6/kernel/futex.c
@@ -1158,10 +1158,40 @@ handle_fault:
*/
#define FLAGS_SHARED 0x01
#define FLAGS_CLOCKRT 0x02
+#define FLAGS_ADAPTIVE 0x03

static long futex_wait_restart(struct restart_block *restart);

-static int futex_wait(u32 __user *uaddr, int fshared,
+#ifdef CONFIG_SMP
+struct thread_info *futex_owner(u32 __user *uaddr)
+{
+ struct thread_info *ti = NULL;
+ struct task_struct *p;
+ pid_t pid;
+ u32 uval;
+
+ if (get_futex_value_locked(&uval, uaddr))
+ return NULL;

Just give up if it would cause a fault?

+
+ pid = uval & FUTEX_TID_MASK;
+ rcu_read_lock();
+ p = find_taks_by_vpid(pid);

I'm impressed that you can create such a solid patch without compiling it!

+ if (p) {
+ const struct cred *cred, *pcred;
+
+ cread = current_cred();
+ pcred = __task_cred(p);
+ if (cred->euid == pcred->euid ||
+ cred->euid == pcred->uid)
+ ti = task_thread_info(p);
+ }
+ rcu_read_unlock();
+
+ return ti;
+}
+#endif
+
+static int futex_wait(u32 __user *uaddr, int flags,
u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
{
struct task_struct *curr = current;
@@ -1176,11 +1206,43 @@ static int futex_wait(u32 __user *uaddr,
if (!bitset)
return -EINVAL;

+#ifdef CONFIG_SMP
+ if (!futex_cmpxchg_enabled || !(flags & FLAGS_ADAPTIVE))
+ goto skip_adaptive;
+
+ preempt_disable();
+ for (;;) {
+ struct thread_info *owner;
+ u32 curval = 0, newval = task_pid_vnr(curr);

Do we need to lookup newval every iteration?

+
+ owner = futex_owner(uaddr);
+ if (owner && futex_spin_on_owner(uaddr, owner))
+ break;
+
+ if (get_futex_value_locked(&uval, uaddr))
+ break;
+
+ curval |= uval & ~FUTEX_TID_MASK;
+ newval |= uval & ~FUTEX_TID_MASK;
+
+ if (cmpxchg_futex_value_locked(uaddr, curval, newval)
+ == newval)

"== curval" isn't it? futex_atomic_cmpxchg_inatomic() returns the oldval, not the newval.

+ return 0;
+
+ if (!owner && (need_resched() || rt_task(curr)))
+ break;

Hrm... why go through the loop at all for an rt_task if we bail on the first iteration?

+
+ cpu_relax();
+ }
+ preempt_enable();
+skip_adaptive:

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/