On Mon, Oct 07, 2024 at 11:23:32AM -0400, Waiman Long wrote:
raw_spin_lock(&rq->__lock);Is the problem that:
sched_tick()
It has nothing to do with trylock, an everything to do with schedulertask_tick_mm_cid()Yes, it is because of KASAN that causes page allocation while holding the
task_work_add()
kasan_save_stack()
idiotic crap while holding rq->__lock ?
Because afaict that is completely insane. And has nothing to do with
rtmutex.
We are not going to change rtmutex because instrumentation shit is shit.
rq->__lock. Maybe we can blame KASAN for this. It is actually not a problem
for non-PREEMPT_RT kernel because only trylock is being used. However, we
don't use trylock all the way when rt_spin_trylock() is being used with
PREEMPT_RT Kernel.
locks being special.
But even so, trying to squirrel a spinlock inside a raw_spinlock is
dodgy at the best of times, yes it mostly works, but should be avoided
whenever possible.
And instrumentation just doesn't count.
This is certainly a problem that we need to fix as thereThere cannot be, lock order is:
may be other similar case not involving rq->__lock lurking somewhere.
rtmutex->wait_lock
task->pi_lock
rq->__lock
Trying to subvert that order gets you a splat, any other:
raw_spin_lock(&foo);
spin_trylock(&bar);
will 'work', despite probably not being a very good idea.
Any case involving the scheduler locks needs to be eradicated, not
worked around.