Re: Question on task_blocks_on_rt_mutex()

From: Paul E. McKenney
Date: Tue Sep 01 2020 - 19:58:24 EST


On Tue, Sep 01, 2020 at 10:49:38AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 31, 2020 at 04:21:30PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 31, 2020 at 03:49:11PM -0700, Paul E. McKenney wrote:
> > > Hello!
> > >
> > > The task_blocks_on_rt_mutex() function uses rt_mutex_owner() to
> > > take a snapshot of the lock owner right up front. At this point,
> > > the ->wait_lock is held, which at first glance prevents the owner
> > > from leaving. Except that if there are not yet any waiters (that is,
> > > the low-order bit of ->owner is zero), rt_mutex_fastunlock() might
> > > locklessly clear the ->owner field. And in that case, it looks like
> > > task_blocks_on_rt_mutex() will blithely continue using the ex-owner's
> > > task_struct structure, without anything that I can see that prevents
> > > the ex-owner from exiting.
> > >
> > > What am I missing here?
> >
> > One thing I missed is that the low-order bit of ->owner would already
> > be set by this point.
> >
> > > The reason that I am looking into this is that locktorture scenario LOCK05
> > > hangs, and does so leaving the torture_rtmutex.waiters field equal to 0x1.
> > > This is of course a legal transitional state, but I would not expect it
> > > to persist for more than three minutes. Yet it often does.
> > >
> > > This leads me to believe that there is a way for an unlock to fail to wake
> > > up a task concurrently acquiring the lock. This seems to be repaired
> > > by later lock acquisitions, and in fact setting the locktorture.stutter
> > > module parameter to zero avoids the hang. Except that I first found the
> > > above apparently unprotected access to what was recently the owner task.
> > >
> > > Thoughts?
> >
> > Some breakage elsewhere, presumably...
>
> And it might be a lost wakeup, given that ->locktorture_wake_up is equal
> to 1 for one of the locktorture writer tasks given the patch below.
> Yes, this is a virtual environment. Except that none of the other
> locktorture scenarios make this happen, nor the rcutorture scenario,
> nor the scftorture scenarios. Maybe just the wrong timing? Who knows,
> looking further.

And it appears that a default-niced CPU-bound SCHED_OTHER process is
not preempted by a newly awakened MAX_NICE SCHED_OTHER process. OK,
OK, I never waited for more than 10 minutes, but on my 2.2GHz that is
close enough to a hang for most people.

Which means that the patch below prevents the hangs. And maybe does
other things as well, firing rcutorture up on it to check.

But is this indefinite delay expected behavior?

This reproduces for me on current mainline as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05

This hangs within a minute of boot on my setup. Here "hangs" is defined
as stopping the per-15-second console output of:
Writes: Total: 569906696 Max/Min: 81495031/63736508 Fail: 0

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/torture.c b/kernel/torture.c
index 1061492..9310c1d 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -602,6 +602,7 @@ static int stutter_gap;
*/
bool stutter_wait(const char *title)
{
+ unsigned i = 0;
int spt;
bool ret = false;

@@ -612,8 +613,13 @@ bool stutter_wait(const char *title)
if (spt == 1) {
schedule_timeout_interruptible(1);
} else if (spt == 2) {
- while (READ_ONCE(stutter_pause_test))
+ while (READ_ONCE(stutter_pause_test)) {
+ if (!(++i & 0xffff))
+ schedule_timeout_interruptible(1);
+ else if (!(i & 0xfff))
+ udelay(1);
cond_resched();
+ }
} else {
schedule_timeout_interruptible(round_jiffies_relative(HZ));
}