[PATCH] mutex: Speed up mutex_spin_on_owner() by not taking the RCU lock

From: Ingo Molnar
Date: Fri Apr 10 2015 - 04:49:11 EST


Linus suggested to get rid of the held RCU read-lock in
mutex_spin_on_owner(). The only real complication is that the 'owner'
task might be freed from under us and we might dereference into
possibly freed kernel memory.

As long as the kernel pointer itself is valid this approach is fine in
this case (see the analysis below) - with the exception of
CONFIG_DEBUG_PAGEALLOC=y and similarly instrumented kernels which
might fault on referencing freed kernel memory.

Use the non-faulting copy-from-user primitive to get the owner->on_cpu
value that we use in NMI handlers and which works even on
CONFIG_DEBUG_PAGEALLOC=y instrumented kernels. This compiles to a
single instruction on most platforms.

This approach might briefly copy in junk data from an already freed
(previous) owner task, which might trigger two scenarios:

1) The junk data causes us to loop once more. This is not
a problem as we'll check the owner on the next loop and
break out of the loop.

2) If the junk value causes us to break out of the loop
that's fine too: it's what we'd have done anyway on
the next iteration, as the lock owner changed.

The inatomic context copy primitives are compiler barriers
too - this matters to make sure the above owner check is
emitted to before the copy attempt.

We also ignore failed copies, as the next iteration will clean
up after us. This saves an extra branch in the common case.

Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Not-Yet-Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/locking/mutex.c | 40 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4cccea6b8934..fcc7db45d62e 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -224,28 +224,42 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
static noinline
bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
{
- bool ret = true;
+ int on_cpu;
+ int ret;

- rcu_read_lock();
while (lock->owner == owner) {
/*
- * Ensure we emit the owner->on_cpu, dereference _after_
- * checking lock->owner still matches owner. If that fails,
- * owner might point to freed memory. If it still matches,
- * the rcu_read_lock() ensures the memory stays valid.
+ * Use the non-faulting copy-user primitive to get the owner->on_cpu
+ * value that works even on CONFIG_DEBUG_PAGEALLOC=y instrumented
+ * kernels. This compiles to a single instruction on most platforms.
+ *
+ * This might briefly copy in junk data from an already freed
+ * (previous) owner task, which might trigger two scenarios:
+ *
+ * 1) The junk data causes us to loop once more. This is not
+ * a problem as we'll check the owner on the next loop and
+ * break out of the loop.
+ *
+ * 2) If the junk value causes us to break out of the loop
+ * that's fine too: it's what we'd have done anyway on
+ * the next iteration, as the lock owner changed.
+ *
+ * NOTE: the inatomic context copy primitives are compiler barriers
+ * too - this matters to make sure the above owner check is
+ * emitted to before the copy attempt.
+ *
+ * NOTE2: We ignore failed copies, as the next iteration will clean
+ * up after us. This saves an extra branch in the common case.
*/
- barrier();
+ ret = __copy_from_user_inatomic(&on_cpu, &owner->on_cpu, sizeof(on_cpu));

- if (!owner->on_cpu || need_resched()) {
- ret = false;
- break;
- }
+ if (!on_cpu || need_resched())
+ return false;

cpu_relax_lowlatency();
}
- rcu_read_unlock();

- return ret;
+ return true;
}

/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/