Re: [PATCH v2 1/4] sched/core: Provide sched_rtmutex() and expose sched work helpers

From: Sebastian Andrzej Siewior
Date: Thu May 25 2023 - 11:25:15 EST

On 2023-05-11 15:43:08 [+0200], Peter Zijlstra wrote:
> > If a sched_submit_work() would use a mutex_t lock then we would
> > recursively call blk_flush_plug() before setting tsk->blocked_on and
> I'm not following, mutex code sets tsk->blocked_on before it calls
> schedule(), getting into the very same problem you have with rt_mutex.
> > perform the same callback and block on the very same lock (again).
> > This isn't different compared to !RT therefore you must not use a
> > sleeping lock (mutex_t) in the callback.
> See the enforcement thing; today nothing stops the code from using a
> mutex or other blocking primitives here.

I tried to explain that if blk_flush_plug() blocks on a mutex_t then it
will invoke schedule() -> blk_flush_plug() -> schedule() ->
blk_flush_plug() -> … until it runs out of stack.

So it is broken regardless of RT but yes we don't enforce it and yes
people might use it and it would work as long as the lock is not

> > Do I rebase my stuff on top of his then and we good?
> I just suggested he try something else:
> if that works out this worry goes away.
> If we get PROVE_RAW_LOCK_NESTING usable, something like the below might
> help out with the validation part...

Okay. So if I don't collide with workqueue do you buy this or do you
ask for something else. I'm not sure…

Regarding PROVE_RAW_LOCK_NESTING: If I boot -rc3 with `quiet' then I
don't see any complains.
Otherwise it is printk during boot (caller is holding raw_spinlock_t and
then printk() calls to serial driver with spinlock_t).
From time to time ppl send "fixes" for PROVE_RAW_LOCK_NESTING splats so
I would guess they boot with `quiet' and there isn't much else. So we
are getting close here I guess.

Do you want me to test the suggested validation map somewhere? Because
if it works, it could be queued.