Re: Question on task_blocks_on_rt_mutex()

From: Paul E. McKenney
Date: Tue Sep 01 2020 - 13:49:42 EST


On Mon, Aug 31, 2020 at 04:21:30PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 31, 2020 at 03:49:11PM -0700, Paul E. McKenney wrote:
> > Hello!
> >
> > The task_blocks_on_rt_mutex() function uses rt_mutex_owner() to
> > take a snapshot of the lock owner right up front. At this point,
> > the ->wait_lock is held, which at first glance prevents the owner
> > from leaving. Except that if there are not yet any waiters (that is,
> > the low-order bit of ->owner is zero), rt_mutex_fastunlock() might
> > locklessly clear the ->owner field. And in that case, it looks like
> > task_blocks_on_rt_mutex() will blithely continue using the ex-owner's
> > task_struct structure, without anything that I can see that prevents
> > the ex-owner from exiting.
> >
> > What am I missing here?
>
> One thing I missed is that the low-order bit of ->owner would already
> be set by this point.
>
> > The reason that I am looking into this is that locktorture scenario LOCK05
> > hangs, and does so leaving the torture_rtmutex.waiters field equal to 0x1.
> > This is of course a legal transitional state, but I would not expect it
> > to persist for more than three minutes. Yet it often does.
> >
> > This leads me to believe that there is a way for an unlock to fail to wake
> > up a task concurrently acquiring the lock. This seems to be repaired
> > by later lock acquisitions, and in fact setting the locktorture.stutter
> > module parameter to zero avoids the hang. Except that I first found the
> > above apparently unprotected access to what was recently the owner task.
> >
> > Thoughts?
>
> Some breakage elsewhere, presumably...

And it might be a lost wakeup, given that ->locktorture_wake_up is equal
to 1 for one of the locktorture writer tasks given the patch below.
Yes, this is a virtual environment. Except that none of the other
locktorture scenarios make this happen, nor the rcutorture scenario,
nor the scftorture scenarios. Maybe just the wrong timing? Who knows,
looking further.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd93..054b6f8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -720,6 +720,7 @@ struct task_struct {
struct list_head rcu_node_entry;
struct rcu_node *rcu_blocked_node;
#endif /* #ifdef CONFIG_PREEMPT_RCU */
+ u8 locktorture_wake_up;

#ifdef CONFIG_TASKS_RCU
unsigned long rcu_tasks_nvcsw;
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 316531d..7f183be 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -617,6 +617,7 @@ static int lock_torture_writer(void *arg)

VERBOSE_TOROUT_STRING("lock_torture_writer task started");
set_user_nice(current, MAX_NICE);
+ current->locktorture_wake_up = 0;

do {
if ((torture_random(&rand) & 0xfffff) == 0)
@@ -635,6 +636,7 @@ static int lock_torture_writer(void *arg)
lock_is_write_held = false;
WRITE_ONCE(last_lock_release, jiffies);
cxt.cur_ops->writeunlock();
+ current->locktorture_wake_up = 0;

stutter_wait("lock_torture_writer");
} while (!torture_must_stop());
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index cfdd5b9..4a0b046 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1053,6 +1053,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q,
*/
preempt_disable();
wake_q_add(wake_q, waiter->task);
+ waiter->task->locktorture_wake_up = 1;
raw_spin_unlock(&current->pi_lock);
}