Re: softlockups in multi_cpu_stop

From: Jason Low
Date: Fri Mar 06 2015 - 16:12:36 EST


On Fri, 2015-03-06 at 11:29 -0800, Jason Low wrote:
> Hi Linus,
>
> Agreed, this is an issue we need to address, though we're just trying to
> figure out if the change to rwsem_can_spin_on_owner() in "commit:
> 37e9562453b" is really the one that's causing the issue.
>
> For example, it looks like Ming recently found another change in the
> same patchset: commit b3fd4f03ca0b995(locking/rwsem: Avoid deceiving
> lock spinners) to be causing lockups.
>
> https://lkml.org/lkml/2015/3/6/521

So I think I may have spotted a problem in the tip commit:

Commit b3fd4f03ca0b995 (locking/rwsem: Avoid deceiving lock spinners).

In owner_running() there are 2 conditions that would make it return
false: if the owner changed or if the owner is not running. However,
that patch continues spinning if there is a "new owner" but it does not
take into account that we may want to stop spinning if the owner is not
running (due to getting rescheduled).

So we we really want this right (not yet tested):

---
Subject: [PATCH] locking/rwsem: Avoid spinning when owner is not running

not-yet-Signed-off-by: Jason Low <jason.low2@xxxxxx>
---
kernel/locking/rwsem-xadd.c | 28 ++++++++--------------------
1 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 06e2214..e9379ee 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -324,32 +324,20 @@ done:
return ret;
}

-static inline bool owner_running(struct rw_semaphore *sem,
- struct task_struct *owner)
-{
- if (sem->owner != owner)
- return false;
-
- /*
- * Ensure we emit the owner->on_cpu, dereference _after_ checking
- * sem->owner still matches owner, if that fails, owner might
- * point to free()d memory, if it still matches, the rcu_read_lock()
- * ensures the memory stays valid.
- */
- barrier();
-
- return owner->on_cpu;
-}
-
static noinline
bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
{
long count;

rcu_read_lock();
- while (owner_running(sem, owner)) {
- /* abort spinning when need_resched */
- if (need_resched()) {
+ while (true) {
+ if (sem->owner != owner)
+ break;
+
+ barrier();
+
+ /* abort spinning when need_resched or owner is not running*/
+ if (!owner->on_cpu || need_resched()) {
rcu_read_unlock();
return false;
}
--
1.7.2.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/