Re: softlockups in multi_cpu_stop

From: Jason Low
Date: Fri Mar 06 2015 - 23:31:47 EST


On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote:

Just in case, here's the updated patch which addresses Linus's comments
and with a changelog.

Note: The changelog says that it fixes (locking/rwsem: Avoid deceiving
lock spinners), though I still haven't seen full confirmation that it
addresses all of the lockup reports.

------
Subject: [PATCH] rwsem: Avoid spinning when owner is not running

Fixes tip commmit b3fd4f03ca0b (locking/rwsem: Avoid deceiving lock spinners).

When doing optimistic spinning in rwsem, threads should stop spinning when
the lock owner is not running. While a thread is spinning on owner, if
the owner reschedules, owner->on_cpu returns false and we stop spinning.

However, commit b3fd4f03ca0b essentially caused the check to get ignored
because when we break out of the spin loop due to !on_cpu, we continue
spinning if sem->owner != NULL.

This patch fixes this by making sure we stop spinning if the owner is not
running. Furthermore, just like with mutexes, refactor the code such that
we don't have separate checks for owner_running(). This makes it more
straightforward in terms of why we exit the spin on owner loop and we
would also avoid needing to "guess" why we broke out of the loop to make
this more readable.

Cc: Ming Lei <ming.lei@xxxxxxxxxxxxx>
Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx>
Signed-off-by: Jason Low <jason.low2@xxxxxx>
---
kernel/locking/rwsem-xadd.c | 31 +++++++++++--------------------
1 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 06e2214..3417d01 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -324,32 +324,23 @@ done:
return ret;
}

-static inline bool owner_running(struct rw_semaphore *sem,
- struct task_struct *owner)
-{
- if (sem->owner != owner)
- return false;
-
- /*
- * Ensure we emit the owner->on_cpu, dereference _after_ checking
- * sem->owner still matches owner, if that fails, owner might
- * point to free()d memory, if it still matches, the rcu_read_lock()
- * ensures the memory stays valid.
- */
- barrier();
-
- return owner->on_cpu;
-}
-
static noinline
bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
{
long count;

rcu_read_lock();
- while (owner_running(sem, owner)) {
- /* abort spinning when need_resched */
- if (need_resched()) {
+ while (sem->owner == owner) {
+ /*
+ * Ensure we emit the owner->on_cpu, dereference _after_
+ * checking sem->owner still matches owner, if that fails,
+ * owner might point to free()d memory, if it still matches,
+ * the rcu_read_lock() ensures the memory stays valid.
+ */
+ barrier();
+
+ /* abort spinning when need_resched or owner is not running */
+ if (!owner->on_cpu || need_resched()) {
rcu_read_unlock();
return false;
}
--
1.7.2.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/