Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

From: Peter Zijlstra
Date: Thu Jul 31 2014 - 07:58:17 EST


On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
>
> This commit can lead to deadlocks by way of what at a high level
> appears to look like a missing wakeup on mutex_unlock() when
> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
> their kernels. In particular, it causes reproducible deadlocks in
> libceph/rbd code under higher than moderate loads with the evidence
> actually pointing to the bowels of mutex_lock().
>
> kernel/locking/mutex.c, __mutex_lock_common():
> 476 osq_unlock(&lock->osq);
> 477 slowpath:
> 478 /*
> 479 * If we fell out of the spin path because of need_resched(),
> 480 * reschedule now, before we try-lock the mutex. This avoids getting
> 481 * scheduled out right after we obtained the mutex.
> 482 */
> 483 if (need_resched())
> 484 schedule_preempt_disabled(); <-- never returns
> 485 #endif
> 486 spin_lock_mutex(&lock->wait_lock, flags);
>
> We started bumping into deadlocks in QA the day our branch has been
> rebased onto 3.15 (the release this commit went in) but then as part of
> debugging effort I enabled all locking debug options, which also
> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
> which is why it hasn't been looked into until now. Revert makes the
> problem go away, confirmed by our users.

This doesn't make sense and you fail to explain how this can possibly
deadlock.

Attachment: pgpqBQtyowiSA.pgp
Description: PGP signature