Re: [RFC][PATCH 8/8] rtmutex: Fix PI chain order integrity

From: Peter Zijlstra
Date: Mon Jun 27 2016 - 08:23:49 EST


On Wed, Jun 15, 2016 at 08:25:07AM +0100, Juri Lelli wrote:
> I guess it's not that likely, but yes it could potentially happen that a
> waiter is optimistically spinning, depletes its runtime, gets throttled
> and then replenished when still spinning. Maybe it doesn't really make
> sense continuing spinning in this situation, but I guess things get
> really complicated. :-/
>
> Anyway, as said, I think this patch is OK. Maybe we want to add a
> comment just to remember what situation can cause an issue if we don't
> do this? Patch changelog would be OK as well for such a comment IMHO.


OK, so I went to write a simple comment and ended up with the below :/

While writing the comment I noticed two issues:

- we update the waiter order fields while the entry is still enqueued
on the pi_waiters tree, which is also sorted by these exact fields.

- another one of these pure ->prio comparisons

Please double check, there be dragons here.

---

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -158,6 +158,9 @@ static inline bool unlock_rt_mutex_safe(
}
#endif

+#define cmp_task(p) \
+ &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }
+
static inline int
rt_mutex_waiter_less(struct rt_mutex_waiter *left,
struct rt_mutex_waiter *right)
@@ -583,8 +586,26 @@ static int rt_mutex_adjust_prio_chain(st

/* [7] Requeue the waiter in the lock waiter tree. */
rt_mutex_dequeue(lock, waiter);
+
+ /*
+ * Update the waiter prio fields now that we're dequeued.
+ *
+ * These values can have changed through either:
+ *
+ * sys_sched_set_scheduler() / sys_sched_setattr()
+ *
+ * or
+ *
+ * DL CBS enforcement advancing the effective deadline.
+ *
+ * Even though pi_waiters also uses these fields, and that tree is only
+ * updated in [11], we can do this here, since we hold [L], which
+ * serializes all pi_waiters access and rb_erase() does not care about
+ * the values of the node being removed.
+ */
waiter->prio = task->prio;
waiter->deadline = task->dl.deadline;
+
rt_mutex_enqueue(lock, waiter);

/* [8] Release the task */
@@ -711,6 +732,8 @@ static int rt_mutex_adjust_prio_chain(st
static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
struct rt_mutex_waiter *waiter)
{
+ lockdep_assert_held(&lock->wait_lock);
+
/*
* Before testing whether we can acquire @lock, we set the
* RT_MUTEX_HAS_WAITERS bit in @lock->owner. This forces all
@@ -770,7 +793,8 @@ static int try_to_take_rt_mutex(struct r
* the top waiter priority (kernel view),
* @task lost.
*/
- if (task->prio >= rt_mutex_top_waiter(lock)->prio)
+ if (!rt_mutex_waiter_less(cmp_task(task),
+ rt_mutex_top_waiter(lock)))
return 0;

/*
@@ -838,6 +862,8 @@ static int task_blocks_on_rt_mutex(struc
struct rt_mutex *next_lock;
int chain_walk = 0, res;

+ lockdep_assert_held(&lock->wait_lock);
+
/*
* Early deadlock detection. We really don't want the task to
* enqueue on itself just to untangle the mess later. It's not
@@ -973,6 +999,8 @@ static void remove_waiter(struct rt_mute
struct task_struct *owner = rt_mutex_owner(lock);
struct rt_mutex *next_lock;

+ lockdep_assert_held(&lock->wait_lock);
+
raw_spin_lock(&current->pi_lock);
rt_mutex_dequeue(lock, waiter);
current->pi_blocked_on = NULL;