Re: Question on task_blocks_on_rt_mutex()

From: Davidlohr Bueso
Date: Tue Sep 01 2020 - 22:07:20 EST


On Tue, 01 Sep 2020, Paul E. McKenney wrote:

And it appears that a default-niced CPU-bound SCHED_OTHER process is
not preempted by a newly awakened MAX_NICE SCHED_OTHER process. OK,
OK, I never waited for more than 10 minutes, but on my 2.2GHz that is
close enough to a hang for most people.

Which means that the patch below prevents the hangs. And maybe does
other things as well, firing rcutorture up on it to check.

But is this indefinite delay expected behavior?

This reproduces for me on current mainline as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05

This hangs within a minute of boot on my setup. Here "hangs" is defined
as stopping the per-15-second console output of:
Writes: Total: 569906696 Max/Min: 81495031/63736508 Fail: 0

Ok this doesn't seem to be related to lockless wake_qs then. fyi there have
been missed wakeups in the past where wake_q_add() fails the cmpxchg because
the task is already pending a wakeup leading to the actual wakeup ocurring
before its corresponding wake_up_q(). This is why we have wake_q_add_safe().
But for rtmutexes, because there is no lock stealing only top-waiter is awoken
as well as try_to_take_rt_mutex() is done under the lock->wait_lock I was not
seeing an actual race here.

Thanks,
Davidlohr