On Tue, 6 Sep 2016, Waiman Long wrote:
+enum futex_type {Please introduce the futex_type magic and the related changes to the pi
+ TYPE_PI = 0,
+ TYPE_TO,
+};
code in a seperate patch so it can be verified independently.
It's sad that one has to explain that to you over and over ....
@@ -836,10 +859,10 @@ static void put_futex_state(struct futex_state *state)The comment above this list_del() is really pointless. I can see that from
return;
/*
- * If state->owner is NULL, the owner is most probably dying
- * and has cleaned up the futex state already
+ * If state->owner is NULL and the type is TYPE_PI, the owner
+ * is most probably dying and has cleaned up the state already
*/
- if (state->owner) {
+ if (state->owner&& (state->type == TYPE_PI)) {
raw_spin_lock_irq(&state->owner->pi_lock);
list_del_init(&state->list);
raw_spin_unlock_irq(&state->owner->pi_lock);
@@ -847,6 +870,11 @@ static void put_futex_state(struct futex_state *state)
rt_mutex_proxy_unlock(&state->pi_mutex, state->owner);
}
+ /*
+ * Dequeue it from the HB futex state list.
+ */
+ list_del_init(&state->hb_list);
the code itself.
Aside of that: Why do you need seperate list heads? You explain the
seperate list somewhere in that big comment below, but it should be
explained at the point where you add it to the state and the hash bucket.
if (current->pi_state_cache)lacks curly braces
kfree(state);
else {
@@ -919,13 +947,24 @@ void exit_pi_state_list(struct task_struct *curr)
continue;
}
- WARN_ON(pi_state->owner != curr);
WARN_ON(list_empty(&pi_state->list));
+ if (pi_state->type == TYPE_PI) {
+ WARN_ON(pi_state->owner != curr);
+ pi_state->owner = NULL;
+ }
list_del_init(&pi_state->list);
- pi_state->owner = NULL;
raw_spin_unlock_irq(&curr->pi_lock);
- rt_mutex_unlock(&pi_state->pi_mutex);
+ if (pi_state->type == TYPE_PI)
+ rt_mutex_unlock(&pi_state->pi_mutex);Another completely useless comment. Because you tell what you do, but not
+ else if (pi_state->type == TYPE_TO) {
+ /*
+ * Need to wakeup the mutex owner.
+ */
WHY.
+ WARN_ON(!pi_state->owner);And what handles or sanity checks the state->hb_list ???
+ if (pi_state->owner)
+ wake_up_process(pi_state->owner);
+/*Please do not use tail comments. They are hard to parse.
+ * Try to lock the userspace futex word (0 => vpid).
+ *
+ * Return: 1 if lock acquired or an error happens, 0 if not.
+ * The status code will be 0 if no error, or< 0 if an error happens.
+ * *puval will contain the latest futex value when trylock fails.
+ *
+ * The waiter flag, if set, will make it ignore the FUTEX_WAITERS bit.
+ * The HB spinlock should NOT be held while calling this function. A
+ * successful lock acquisition will clear the waiter and died bits.
+ */
+static inline int futex_trylock_to(u32 __user *uaddr, u32 vpid, u32 *puval,
+ const bool waiter, int *status)
+{
+ u32 uval;
+
+ *status = 0;
+
+ if (unlikely(get_user(uval, uaddr)))
+ goto efault;
+
+ *puval = uval;
+
+ if (waiter ? (uval& FUTEX_TID_MASK) : uval)
+ return 0; /* Trylock fails */
+Do we really need another variant of cmpxchg and why do you need that extra
+ if (unlikely(futex_atomic_cmpxchg_inatomic(puval, uaddr, uval, vpid)))
+ goto efault;
+
+ return *puval == uval;
+
+efault:
+ *status = -EFAULT;
+ return 1;
+}
status? What's wrong in doing the magic in the return value?
+That number is pulled out of thin air or what is the rationale for chosing
+/*
+ * Spinning threshold for futex word without setting FUTEX_WAITERS.
+ */
+#define FUTEX_SPIN_THRESHOLD (1<< 10)
1024?
+/*If you document functions then please follow the style of this file which
+ * Spin on the futex word while the futex owner is active. Otherwise, set
+ * the FUTEX_WAITERS bit and go to sleep. As we take a reference to the futex
+ * owner's task structure, we don't need to use RCU to ensure that the task
+ * structure is valid. The function will directly grab the lock if the
+ * owner is dying or the pid is invalid. That should take care of the problem
+ * of dead lock owners unless the pid wraps around and the preceived owner is
+ * not the real owner.
+ *
+ * Return: 0 if futex acquired,< 0 if an error happens.
uses kerneldoc comments.
+ */Please use understandable variable names instead of this horrible tail
+static int futex_spin_on_owner(u32 __user *uaddr, u32 vpid,
+ struct futex_state *state)
+{
+ int ret, loop = FUTEX_SPIN_THRESHOLD;
+ u32 uval, curval;
+ u32 opid = 0; /* Futex owner task ID */
+ struct task_struct *otask = NULL; /* Futex owner task struct */
comments. What's wrong with using owner and owner_pid?
+ bool on_owner_list = false;And here you are. What the hell is wrong with:
+
+ preempt_disable();
+ WRITE_ONCE(state->owner, current);
+ for (;; loop--) {
+ if (futex_trylock_to(uaddr, vpid,&uval, true,&ret))
+ break;
ret = futex_trylock(uaddr, vpid,&uval, true);
if (ret< 0)
break;
Nothing is wrong. It's just way simpler to read and understand....
+ WARN_ON_ONCE(!(uval& FUTEX_TID_MASK));Can you pretty please use split out functions for this otask handling? This
+
+ if ((uval& FUTEX_TID_MASK) != opid) {
+ /*
+ * Get the new task structure
+ */
+ if (otask)
+ put_task_struct(otask);
+
+ WARN_ON_ONCE(on_owner_list);
+ opid = uval& FUTEX_TID_MASK;
+ otask = futex_find_get_task(opid);
+ }
for loop does not fit on a single screen and can't be understood in one go.
+ if (unlikely(!otask || (otask->flags& PF_EXITING) ||This code flow is completely non parseable. Stop this and put proper
+ (uval& FUTEX_OWNER_DIED))) {
+ /*
+ * PID invalid or exiting/dead task, try to grab
+ * the lock now.
+ */
+ ret = futex_atomic_cmpxchg_inatomic(&curval,
+ uaddr, uval, vpid);
+ if (unlikely(ret))
+ goto efault;
+ if (curval != uval)
+ continue; /* Futex value changed */
comments above decisions which explain why and not what.
+ pr_info("futex_spin_on_owner: pid %d grabs futex from pid %d (%s)!\n",Really useful information to confuse sysadmins. A proper comment explaining
+ vpid, opid, otask ? "dying" : "invalid");
the issue and the implications and how we deal with it would have been the
right thing to do...
+ break;Groan. This is more than horrible to read.
+ }
+ while ((loop<= 0)&& !(uval& FUTEX_WAITERS)) {
if (loop<= 0) {
if (futex_set_waiters_bit(....))
goto fault;
}
If at all. This loop<= 0 thing is utterly confusing.
+ /*I had to look five times to figure out to which loop this break belongs. I
+ * Need to set the FUTEX_WAITERS bit.
+ */
+ if (futex_atomic_cmpxchg_inatomic(&curval, uaddr, uval,
+ uval | FUTEX_WAITERS))
+ goto efault;
+ if (curval == uval) {
+ uval |= FUTEX_WAITERS;
+ break;
really wonder how you manage to keep track of this mess.
+ }Gah. I can see that you go over to start and do trylock, but WHY ? I know
+ uval = curval;
+ }
+
+ if (!(uval& ~FUTEX_TID_MASK))
+ continue; /* Do trylock again */
it, but for the casual reader it's fcking non obvious.
+This is completely fucked, really.
+ if (need_resched()) {
+ __set_current_state(TASK_RUNNING);
+ schedule_preempt_disabled();
+ continue;
+ }
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ /*
+ * If the owner isn't active, we need to go to sleep after
+ * making sure that the FUTEX_WAITERS bit is set. We also
+ * need to put the futex state into the futex owner's
+ * pi_state_list to prevent deadlock when the owner dies.
+ */
+ if (!otask->on_cpu) {
+ if (!(uval& FUTEX_WAITERS)) {
+ loop = 0;
+ continue;
if (owner->on_cpu) {
cpu_relax();
continue;
}
ret = futex_sleep();
if (ret< 0)
goto fault;
if (ret == FUTEX_AQUIRED)
break;
Your attempt to resue code which is in the loop above is just making this
completely unreadable. Hell no! Futexes are complex enough already. We
really can do without obfuscation. This not the obfuscated C-code contest.
+ if (futex_trylock_to(uaddr, vpid,&uval, false,&ret))This lacks curly braces. See: https://marc.info/?l=linux-kernel&m=147351236615103
+ /* Lock acquired or an error happens */
+ return ret;
+Why are you using the global hash for this? As we have shown the global
+ /*
+ * Detect deadlocks.
+ */
+ if (unlikely(((uval& FUTEX_TID_MASK) == vpid) ||
+ should_fail_futex(true)))
+ return -EDEADLK;
+
+ if (refill_futex_state_cache())
+ return -ENOMEM;
+
+ futex_set_timer(time,&to,&timeout, flags, current->timer_slack_ns);
+
+ ret = get_futex_key(uaddr, flags& FLAGS_SHARED,&key, VERIFY_WRITE);
+ if (unlikely(ret))
+ goto out;
+
+ hb = hash_futex(&key);
+ spin_lock(&hb->lock);
hash has horrible performance due to cross node access and potential hash
bucket lock contention of unrelated processes. If we add something new
which is performance optimized then we pretty please make it use a seperate
process private storage. I don't see any point in making this optimized for
process shared futexes.
+So if futex_spin_on_owner() faults, then you return -EFAULT to user space
+ /*
+ * Locate the futex state by looking up the futex state list in the
+ * hash bucket. If it isn't found, create a new one and put it into
+ * the list.
+ */
+ state = lookup_futex_state(hb,&key, true);
+
+ /*
+ * We don't need to hold the HB lock after looking up the futex state
+ * as we have incremented the reference count.
+ */
+ spin_unlock(&hb->lock);
+ BUG_ON(!state);
+
+ /*
+ * Acquiring the serialization mutex.
+ */
+ if (state->type != TYPE_TO) {
+ ret = -EINVAL;
+ } else {
+ if (to)
+ hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS);
+ ret = mutex_lock_interruptible(&state->mutex);
+ }
+
+ if (unlikely(ret))
+ /*
+ * We got a signal or has some other error, need to abort
+ * the lock operation and return.
+ */
+ goto out_put_state_key;
+
+ /*
+ * As the mutex owner, we can now spin on the futex word as well as
+ * the active-ness of the futex owner.
+ */
+ ret = futex_spin_on_owner(uaddr, vpid, state);
w/o trying to page in the stuff? That's just wrong.
+static int futex_unlock_to(u32 __user *uaddr, unsigned int flags)If there is a new lock owner or the lock has been released? Voodoo magic or
+{
+ u32 uval, pid, vpid = task_pid_vnr(current);
+ union futex_key key = FUTEX_KEY_INIT;
+ struct futex_hash_bucket *hb;
+ struct futex_state *state;
+ struct task_struct *owner;
+ int ret;
+
+ if (get_user(uval, uaddr))
+ return -EFAULT;
+ /*
+ * If there is a new lock owner, we can exit now.
+ * If uval is 0, another task may have acquired and release the
+ * lock in the mean time, so we should also exit.
what? How can user space take over the lock when the current task owns it?
That's just broken. The only reason to get here is that user space called
in because it was not able to release the lock atomically, i.e. because the
waiters bit was set. If anything fiddled with the uval then returning 0 is
beyond stupid.
+ */Crap. It's legit to call here even if the waiters bit is not set.
+ pid = uval& FUTEX_TID_MASK;
+ if ((pid&& (pid != vpid)) || !uval)
+ return 0;
+ WARN_ON_ONCE(!(uval& FUTEX_WAITERS));
+ /*What do you encourage here? How is the mutex transfer accelerated?
+ * If the TID isn't cleared in the userspace, clear it here to
+ * encourage faster lock transfer to the mutex owner.
+ */What guarantees that owner cannot exit between reading owner state and
+ if (pid == vpid) {
+ futex_atomic_cmpxchg_inatomic(&uval, uaddr, uval,
+ uval& ~FUTEX_TID_MASK);
+ WARN_ON((uval& FUTEX_TID_MASK) != vpid);
+ }
+ ret = get_futex_key(uaddr, flags& FLAGS_SHARED,&key, VERIFY_WRITE);
+ if (ret)
+ return ret;
+
+ hb = hash_futex(&key);
+ spin_lock(&hb->lock);
+
+ /*
+ * Check the hash bucket only for matching futex state.
+ */
+ state = lookup_futex_state(hb,&key, false);
+
+ if (!state)
+ goto out_unlock;
+
+ if (state->type != TYPE_TO) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /*
+ * We don't need to do the wakeup if the futex isn't equal to
+ * FUTEX_WAITERS as it implies that someone else has taken over
+ * the futex.
+ */
+ if (get_user(uval, uaddr)) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
+ owner = READ_ONCE(state->owner);
+ if ((uval == FUTEX_WAITERS)&& owner)
+ ret = wake_up_process(owner);
calling wake_up_process?
Dammit. If you read the source in this file carefully, then you notice that
it contains a lot of documentation about life time rules, protection and
serialization.
If you think, that it's the reviewers problem to figure that out from your
w/o proper comments in place, then you are completely on the wrong track.
I'm not even trying to think about the concept itself as long as this is
presented as a pile of unpenetrable clusterfuck.