Re: [PATCH] sched: Fix race in rt_mutex_pre_schedule by removing non-atomic fetch_and_set

From: Peter Zijlstra

Date: Mon Oct 06 2025 - 15:07:56 EST


On Wed, Aug 27, 2025 at 04:17:50PM +0800, cuiguoqi wrote:
> The issue arises during EDEADLK testing in `lib/locking-selftest.c` when `is_wait_die=1`.
>
> In this mode, the current thread's `debug_locks` flag is disabled via `__debug_locks_off` (which calls `xchg(&debug_locks, 0)`) during the blocking path of `rt_mutex_slowlock`, specifically in `rt_mutex_slowlock_block()`:
>
> rt_mutex_slowlock()
> rt_mutex_pre_schedule()
> rt_mutex_slowlock_block()
> DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock)
> __debug_locks_off(); // xchg(&debug_locks, 0)
>
> However, `rt_mutex_post_schedule()` still performs:
>
> lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0));
>
> Which expands to:
>
> do {
> WARN_ON(debug_locks && !( ({ int _x = current->sched_rt_mutex; current->sched_rt_mutex = 0; _x; }) ));
> } while (0)
>
> The generated assembly shows that the entire assertion is conditional on `debug_locks`:
>
> adrp x0, debug_locks
> ldr w0, [x0]
> cbz w0, .label_skip_warn // Skip WARN if debug_locks == 0
>
> This means: if `debug_locks` was cleared earlier, the check on `current->sched_rt_mutex` is effectively skipped, and the flag may remain set.
>
> Later, when the same task re-enters `rt_mutex_slowlock`, it calls `lockdep_reset()` to re-enable `debug_locks`, but the stale `current->sched_rt_mutex` state (left over from the previous lock attempt) causes a false-positive warning in `rt_mutex_pre_schedule()`:
>
> WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:7085 rt_mutex_pre_schedule+0xa8/0x108
>
> Because:
> - `rt_mutex_pre_schedule()` asserts `!current->sched_rt_mutex`
> - But the flag was never properly cleared due to the skipped post-schedule check.
>
> This is not a data race on the flag itself, but a **state inconsistency caused by conditional debugging logic** — the `fetch_and_set` macro is not atomic, but more importantly, the assertion is bypassed when `debug_locks` is off, breaking the expected state transition.

Yeah, I can't really make myself care too much. This means you've
already had errors before -- resulting in debug_locks getting cleared.
Fix those and this problem goes away.

debug_locks is inherently racy; I don't see value in trying to fix all
that.