[PATCH] sched: Do not release current rq lock on non contended double_lock_balance()

From: Steven Rostedt
Date: Mon Jun 13 2016 - 12:37:40 EST


Back in 2008 Gregory Haskins noticed that there was unfair locking when
dealing with double_lock_balance(). The old code (which still exists
when CONFIG_PREEMPT is disabled), on contention, will check the
ordering of the run-queue locks (by address) to determine if it should
release the current lock before taking the second lock.

if (!spin_trylock(busiest->lock)) {
if (busiest < this_rq) {
spin_unlock(&this_rq->lock);
spin_lock(&busiest->lock);
spin_lock(&this_rq->lock);
} else
spin_lock(&busiest->lock);
}

This gives unfair access for higher CPUs who are trying to take a
contended lock. For lower numbered CPUs (with lower address of their rq
structure), it wont release its lock when trying to take the next lock.
This can cause an escalation of latency in locking if there's a waiter
on the current 'this_rq->lock', which now must also wait for the
'busiest->lock' to be taken.

The solution was to simply release the current (this_rq) lock and then
take both locks.

spin_unlock(&this_rq->lock);
double_rq_lock(this_rq, busiest);

Where double_rq_lock() takes the locks in order of the rq's address.
This way, if there was a waiter on the rq lock, due to the ticket
spinlocks, it would grab the lock immediately, and the current owner
will have to wait for it now that it released the lock.

What I found while testing an 80 CPU core was a large "ping-pong"ing
around of the locks.

Let's say that an RT task on a CPU is woken up from another CPU, and
the CPU that the task was woken on is also running an RT task and tries
to push:


CPU 0 CPU 1
----- -----
[ wake up ]
spin_lock(cpu1_rq->lock);
spin_lock(cpu1_rq->lock)
double_lock_balance()
[ release cpu1_rq->lock ]
spin_lock(cpu1_rq->lock)
[due to ticket, now acquires
cpu1_rq->lock ]

[goes to push task]
double_lock_balance()
[ release cpu1_rq->lock ]
[ acquires lock ]
spin_lock(cpu2_rq->lock)
[ blocks as cpu2 is using it ]

And then CPU2 would call double_lock_balance() and the ping pong
continues.

What I could not understand about Gregory's patch is that regardless of
contention, the currently held lock is always released, opening up a
window for this ping ponging to occur. When I changed the code to only
release on contention of the second lock, things improved tremendously.

The problem that Gregory was facing was that there was an unfair access
when there was contention. But what his solution did was to always
release the lock even when there was no contention on the second lock,
which happened to cause more contention later on.

I talked with Peter Zijlstra about this, and he pointed me to the
thread where this patch was discussed back in 2008. As I read the
thread, Nick Piggin brought up this very issue. After the little
discussion, it was determined to let the waiting task have the lock
even when there was no contention to be even more "fair". The problem
though, Nick's suggestion was never tested. If it had been back then,
I'm sure that Gregory would have decided to only release the lock if
there was contention on the second lock.

Link: http://lkml.kernel.org/r/20080825201534.23217.14936.stgit@xxxxxxxxxxxxxxx

Suggested-by: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
---
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d23527e..3ed9ef770085 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1548,10 +1548,15 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
__acquires(busiest->lock)
__acquires(this_rq->lock)
{
- raw_spin_unlock(&this_rq->lock);
- double_rq_lock(this_rq, busiest);
+ int ret = 0;
+
+ if (unlikely(!raw_spin_trylock(&busiest->lock))) {
+ raw_spin_unlock(&this_rq->lock);
+ double_rq_lock(this_rq, busiest);
+ ret = 1;
+ }

- return 1;
+ return ret;
}

#else