Re: [bisected] system hang after boot

From: Will Deacon
Date: Wed Nov 22 2017 - 13:27:04 EST


Hi Sebastian,

Thanks for the report.

On Wed, Nov 22, 2017 at 06:46:01PM +0100, Sebastian Ott wrote:
> One of my test systems (s390) hangs after boot. One of the cpus is idle
> the other is in arch_spin_relax. Bisect pointed to this one:
>
> a8a217c22116eff6c120d753c9934089fb229af0 is the first bad commit
> commit a8a217c22116eff6c120d753c9934089fb229af0
> Author: Will Deacon <will.deacon@xxxxxxx>
> Date: Tue Oct 3 19:25:27 2017 +0100
>
> locking/core: Remove {read,spin,write}_can_lock()
>
> Outside of the locking code itself, {read,spin,write}_can_lock() have no
> users in tree. Apparmor (the last remaining user of write_can_lock()) got
> moved over to lockdep by the previous patch.
>
> This patch removes the use of {read,spin,write}_can_lock() from the
> BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
> lock status, and subsequently removes the unused macros altogether. They
> aren't guaranteed to work in a concurrent environment and can give
> incorrect results in the case of qrwlock.
>
> Signed-off-by: Will Deacon <will.deacon@xxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: paulmck@xxxxxxxxxxxxxxxxxx
> Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@xxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>
>
> Bisect log and config attached. The same config but with lockdep enabled
> works ok.

I suspect this is related to CONFIG_GENERIC_LOCKBREAK and the use of the
lock->break_lock field when that is enabled. I've always been suspicious
about the lack of ordering guarantees there, and I think my patch makes
that worse.

The old code was:

#define BUILD_LOCK_OPS(op, locktype) \
void __lockfunc __raw_##op##_lock(locktype##_t *lock) \
{ \
for (;;) { \
preempt_disable(); \
if (likely(do_raw_##op##_trylock(lock))) \
break; \
preempt_enable(); \
\
if (!(lock)->break_lock) \
(lock)->break_lock = 1; \
while (!raw_##op##_can_lock(lock) && (lock)->break_lock)\
arch_##op##_relax(&lock->raw_lock); \
} \
(lock)->break_lock = 0; \
} \

which is dodgy for a few reasons:

1. There is a write-write race on break_lock, so if >1 CPU tries to take
lock concurrently then break_lock can end up as either 0 or 1.

2. break_lock is just a plain variable, so it can happily sit in a
local store buffer (for a finite time), meaning that waiters will
almost certainly see it as 1 in the while loop and the lock holder
will almost certainly see it as 0

Now, the raw_##op##_can_lock macro is also problematic, which is why I
removed it: it suffers from (2) above, but not from (1) and will eventually
cause the waiters to break redo the trylock() once the unlockers store
buffer has drained.

The new code removes the raw_##op##_can_lock, so the waiters get stuck.

Now, I can't see what the break_lock is doing here other than causing
problems. Is there a good reason for it, or can you just try removing it
altogether? Patch below.

Will

--->8

diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index cc0072e93e36..857a72ceb794 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -10,9 +10,6 @@
*/
typedef struct {
arch_rwlock_t raw_lock;
-#ifdef CONFIG_GENERIC_LOCKBREAK
- unsigned int break_lock;
-#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index a39186194cd6..3bf273538840 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -107,16 +107,11 @@ do { \

#define raw_spin_is_locked(lock) arch_spin_is_locked(&(lock)->raw_lock)

-#ifdef CONFIG_GENERIC_LOCKBREAK
-#define raw_spin_is_contended(lock) ((lock)->break_lock)
-#else
-
#ifdef arch_spin_is_contended
#define raw_spin_is_contended(lock) arch_spin_is_contended(&(lock)->raw_lock)
#else
#define raw_spin_is_contended(lock) (((void)(lock), 0))
#endif /*arch_spin_is_contended*/
-#endif

/*
* This barrier must provide two things:
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 73548eb13a5d..24b4e6f2c1a2 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -19,9 +19,6 @@

typedef struct raw_spinlock {
arch_spinlock_t raw_lock;
-#ifdef CONFIG_GENERIC_LOCKBREAK
- unsigned int break_lock;
-#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 1fd1a7543cdd..7d0a58399c1d 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -65,13 +65,8 @@ void __lockfunc __raw_##op##_lock(locktype##_t *lock) \
if (likely(do_raw_##op##_trylock(lock))) \
break; \
preempt_enable(); \
- \
- if (!(lock)->break_lock) \
- (lock)->break_lock = 1; \
- while ((lock)->break_lock) \
- arch_##op##_relax(&lock->raw_lock); \
+ arch_##op##_relax(&lock->raw_lock); \
} \
- (lock)->break_lock = 0; \
} \
\
unsigned long __lockfunc __raw_##op##_lock_irqsave(locktype##_t *lock) \
@@ -85,13 +80,8 @@ unsigned long __lockfunc __raw_##op##_lock_irqsave(locktype##_t *lock) \
break; \
local_irq_restore(flags); \
preempt_enable(); \
- \
- if (!(lock)->break_lock) \
- (lock)->break_lock = 1; \
- while ((lock)->break_lock) \
- arch_##op##_relax(&lock->raw_lock); \
+ arch_##op##_relax(&lock->raw_lock); \
} \
- (lock)->break_lock = 0; \
return flags; \
} \
\