Re: [PATCH v2 1/1] futex: Add FUTEX_SPIN operation

From: Mathieu Desnoyers
Date: Thu May 23 2024 - 16:20:04 EST


On 2024-05-23 16:07, André Almeida wrote:
Add a new mode for futex wait, the futex spin.

Given the FUTEX2_SPIN flag, parse the futex value as the TID of the lock
owner. Then, before going to the normal wait path, spins while the lock
owner is running in a different CPU, to avoid the whole context switch
operation and to quickly return to userspace. If the lock owner is not
running, just sleep as the normal futex wait path.

The user value is masked with FUTEX_TID_MASK, to allow some bits for
future use.

The check for the owner to be running or not is important to avoid
spinning for something that won't be released quickly. Userspace is
responsible on providing the proper TID, the kernel does a basic check.

Signed-off-by: André Almeida <andrealmeid@xxxxxxxxxx>
---

[...]

+
+static int futex_spin(struct futex_hash_bucket *hb, struct futex_q *q,
+ struct hrtimer_sleeper *timeout, u32 uval)
+{
+ struct task_struct *p;
+ pid_t pid = uval & FUTEX_TID_MASK;
+
+ p = find_get_task_by_vpid(pid);
+
+ /* no task found, maybe it already exited */
+ if (!p) {
+ futex_q_unlock(hb);
+ return -EAGAIN;
+ }
+
+ /* can't spin in a kernel task */
+ if (unlikely(p->flags & PF_KTHREAD)) {
+ put_task_struct(p);
+ futex_q_unlock(hb);
+ return -EPERM;
+ }
+
+ futex_queue(q, hb);
+
+ if (timeout)
+ hrtimer_sleeper_start_expires(timeout, HRTIMER_MODE_ABS);
+
+ while (1) {

Infinite loops in other kernel/futex/ files appear to use "for (;;) {"
instead.

+ if (likely(!plist_node_empty(&q->list))) {
+ if (timeout && !timeout->task)
+ goto exit;
+
+ if (task_on_cpu(p)) {
+ /* spin */

You may want to add a "cpu_relax();" here to lessen the
power consumption of this busy-loop.

+ continue;
+ } else {
+ /* task is not running, sleep */
+ break;
+ }
+ } else {
+ goto exit;
+ }
+ }
+
+ /* spinning didn't work, go to the normal path */
+ set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);

I wonder if flipping the order between:

set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
futex_queue(q, hb);

as done in futex_wait_queue() and what is done here matters ?
Does it introduce a race where a wakeup could be missed ?

If it's an issue, then setting the current state could be done
before invoking futex_queue(), and whenever the spin exits,
set it back to TASK_RUNNING.


+
+ if (likely(!plist_node_empty(&q->list))) {
+ if (!timeout || timeout->task)
+ schedule();
+ }
+
+ __set_current_state(TASK_RUNNING);
+
+exit:
+ put_task_struct(p);
+ return 0;
+}
+
/**
* futex_unqueue_multiple - Remove various futexes from their hash bucket
* @v: The list of futexes to unqueue
@@ -665,8 +732,15 @@ int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
if (ret)
return ret;
- /* futex_queue and wait for wakeup, timeout, or a signal. */
- futex_wait_queue(hb, &q, to);
+ if (flags & FLAGS_SPIN) {
+ ret = futex_spin(hb, &q, to, val);
The empty line below could be removed.

Thanks,

Mathieu

+
+ if (ret)
+ return ret;
+ } else {
+ /* futex_queue and wait for wakeup, timeout, or a signal. */
+ futex_wait_queue(hb, &q, to);
+ }
/* If we were woken (and unqueued), we succeeded, whatever. */
if (!futex_unqueue(&q))

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com