Re: commit 894d1b3db41c leads to frequent hangs when booting

From: Bert Karwatzki
Date: Thu Dec 12 2024 - 07:14:55 EST


Am Donnerstag, dem 12.12.2024 um 12:56 +0100 schrieb Bert Karwatzki:
> Am Donnerstag, dem 12.12.2024 um 11:31 +0100 schrieb Bert Karwatzki:
> > Am Mittwoch, dem 11.12.2024 um 23:40 -0800 schrieb John Stultz:
> > > On Wed, Dec 11, 2024 at 6:41 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Dec 11, 2024 at 4:17 PM Bert Karwatzki <spasswolf@xxxxxx> wrote:
> > > > > Am Mittwoch, dem 11.12.2024 um 15:14 -0800 schrieb John Stultz:
> > > > > > On Wed, Dec 11, 2024 at 2:46 PM Bert Karwatzki <spasswolf@xxxxxx> wrote:
> > > > > > >
> > > > > > > Am Mittwoch, dem 11.12.2024 um 22:35 +0100 schrieb Bert Karwatzki:
> > > > > > > > I have confirmed that I that linux-next-20241210 is fixed by the same revert
> > > > > > > > as v6.13-rc2 (ten boots without incident is the criterion for a good commit)
> > > > > > > >
> > > > > > > >
> > > > > > > > Bert Karwatzki
> > > > > > >
> > > > > > > Also this bug only occurs with CONFIG_PREEMPT_RT=y, I've just checked v6.13-rc2
> > > > > > > without the revert and the following preempt settings and got 10 clean boots:
> > > > > > >
> > > > > >
> > > > > > Hrm. That matches the case where the fix here helped:
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=82f9cc094975240885c93effbca7f4603f5de1bf
> > > > > >
> > > > > >
> > > > > > I'm still working on getting a minimal delta from your config booting
> > > > > > in my environment, but in the meantime, I'd be curious if the
> > > > > > following reduced revert helps?
> > > > > > https://github.com/johnstultz-work/linux-dev/commit/60c60f85670fb1f4708adbe55e15ab918d96f9f0
> > > > > >
> > > > > > Basically just trying to clarify if the problem is moving the wakeup
> > > > > > to the wake_q or if some other interaction (maybe with the
> > > > > > preempt_disables) could be causing the issue.
> > > > > > (I'm assuming you have 82f9cc094975 applied when you test with that change)
> > > > > >
> > > > >
> > > > > I tested linux-next-20241210 (which includes 82f9cc094975) with your patch
> > > > >
> > > > > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> > > > > index 37f025a096c9..724d4c871cf6 100644
> > > > > --- a/kernel/locking/ww_mutex.h
> > > > > +++ b/kernel/locking/ww_mutex.h
> > > > > @@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER
> > > > > *waiter,
> > > > > #ifndef WW_RT
> > > > > debug_mutex_wake_waiter(lock, waiter);
> > > > > #endif
> > > > > - wake_q_add(wake_q, waiter->task);
> > > > > + wake_up_process(waiter->task);
> > > > > }
> > > > >
> > > > > return true;
> > > > > @@ -332,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
> > > > > * wakeup pending to re-read the wounded state.
> > > > > */
> > > > > if (owner != current)
> > > > > - wake_q_add(wake_q, owner);
> > > > > + wake_up_process(owner);
> > > > >
> > > > > return true;
> > > > > }
> > > > >
> > > > > and this fixes the issue for me.
> > > >
> > > > Ok, thanks for validating that! Hrm. Ok, I suspect I'm missing a path
> > > > in rtmutex.c that calls into __schedule() before we pop up to the
> > > > point where we actually wake the wake_q.
> > > >
> > > > I may have found one or two spots that look likely, and I believe I've
> > > > also managed to reproduce your problem using the test-ww_mutex driver.
> > > >
> > > > So I'm working up a proper patch and will share for testing here soon.
> > >
> > > Ok, could you try the patch here:
> > > https://github.com/johnstultz-work/linux-dev/commit/3c902b92c88122cd034937e8d40930bac254a7c5
> > >
> > > I've repushed to the test branch I shared earlier, so its against
> > > 6.13-rc2 + fix that has landed upstream + test fix above ( + another
> > > patch that makes stressing the ww_mutexes easier):
> > > https://github.com/johnstultz-work/linux-dev/commits/debug/894d1b3db41c-hang-bert/
> > >
> > > Let me know if you still have issues with this.
> > > thanks
> > > -john
> >
> > I cherry-picked your two patches on top of linux-next-20241210:
> >
> > 2029cb31eae9 (HEAD -> master) MAYBEFIX: Make sure we wake anything on the wake_q
> > when we release the lock->wait_lock
> > b525e4779098 test-ww_mutex: Allow test to be run (and re-run) from userland
> > 1b2ab8149928 (tag: next-20241210) Add linux-next specific files for 20241210
> >
> > This fixes the issue for me.
> >
> > Bert Karwatzki
> >
>
> Just tested half the fix (only the task_blocks_on_rt_mutex() part on top of
> linux-next-20241212 where the bug is also present) and found it to be sufficient
> to fix the issue for me:
>
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index e858de203eb6..ddbb7423fb69 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1292,7 +1292,13 @@ static int __sched task_blocks_on_rt_mutex(struct
> rt_mutex_base *lock,
> */
> get_task_struct(owner);
>
> + preempt_disable();
> raw_spin_unlock_irq(&lock->wait_lock);
> + /* wake up any tasks on the wake_q before calling
> rt_mutex_adjust_prio_chain */
> + wake_up_q(wake_q);
> + wake_q_init(wake_q);
> + preempt_enable();
> +
>
> res = rt_mutex_adjust_prio_chain(owner, chwalk, lock,
> next_lock, waiter, task);
>
>
>
> Bert Karwatzki

I also tried the other half of your fix (again on linux-next-20241212)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index e858de203eb6..3b362daa4677 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1596,6 +1596,7 @@ static void __sched remove_waiter(struct rt_mutex_base
*lock,
* or TASK_UNINTERRUPTIBLE)
* @timeout: the pre-initialized and started timer, or NULL for
none
* @waiter: the pre-initialized rt_mutex_waiter
+ * @wake_q: wake_q of tasks to wake when we drop the lock-
>wait_lock
*
* Must be called with lock->wait_lock held and interrupts disabled
*/
@@ -1603,7 +1604,8 @@ static int __sched rt_mutex_slowlock_block(struct
rt_mutex_base *lock,
struct ww_acquire_ctx *ww_ctx,
unsigned int state,
struct hrtimer_sleeper *timeout,
- struct rt_mutex_waiter *waiter)
+ struct rt_mutex_waiter *waiter,
+ struct wake_q_head *wake_q)
__releases(&lock->wait_lock) __acquires(&lock->wait_lock)
{
struct rt_mutex *rtm = container_of(lock, struct rt_mutex, rtmutex);
@@ -1634,7 +1636,13 @@ static int __sched rt_mutex_slowlock_block(struct
rt_mutex_base *lock,
owner = rt_mutex_owner(lock);
else
owner = NULL;
+ preempt_disable();
raw_spin_unlock_irq(&lock->wait_lock);
+ if (wake_q) {
+ wake_up_q(wake_q);
+ wake_q_init(wake_q);
+ }
+ preempt_enable();

if (!owner || !rtmutex_spin_on_owner(lock, waiter, owner))
rt_mutex_schedule();
@@ -1708,7 +1716,7 @@ static int __sched __rt_mutex_slowlock(struct
rt_mutex_base *lock,

ret = task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk,
wake_q);
if (likely(!ret))
- ret = rt_mutex_slowlock_block(lock, ww_ctx, state, NULL,
waiter);
+ ret = rt_mutex_slowlock_block(lock, ww_ctx, state, NULL,
waiter, wake_q);

if (likely(!ret)) {
/* acquired the lock */
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index 33ea31d6a7b3..191e4720e546 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -383,7 +383,7 @@ int __sched rt_mutex_wait_proxy_lock(struct rt_mutex_base
*lock,
raw_spin_lock_irq(&lock->wait_lock);
/* sleep on the mutex */
set_current_state(TASK_INTERRUPTIBLE);
- ret = rt_mutex_slowlock_block(lock, NULL, TASK_INTERRUPTIBLE, to,
waiter);
+ ret = rt_mutex_slowlock_block(lock, NULL, TASK_INTERRUPTIBLE, to,
waiter, NULL);
/*
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.

this alone also fixes the issue for me.

Bert Karwatzki