Re: [RFC][PATCH 1/8] rtmutex: Deboost before waking up the top waiter

From: Juri Lelli
Date: Tue Jun 14 2016 - 05:09:42 EST


Hi,

I've got only nitpicks for the changelog. Otherwise the patch looks good
to me (and yes, without it bw inheritance would be a problem).

On 07/06/16 21:56, Peter Zijlstra wrote:
> From: Xunlei Pang <xlpang@xxxxxxxxxx>
>
> We should deboost before waking the high-prio task, such that
> we don't run two tasks with the same "state"(priority, deadline,
^
space

> sched_class, etc) during the period between the end of wake_up_q()
> and the end of rt_mutex_adjust_prio().
>
> As "Peter Zijlstra" said:
> Its semantically icky to have the two tasks running off the same

s/Its/It's/

> state and practically icky when you consider bandwidth inheritance --
> where the boosted task wants to explicitly modify the state of the
> booster. In that latter case you really want to unboost before you
> let the booster run again.
>
> But this however can lead to prio-inversion if current would get
> preempted after the deboost but before waking our high-prio task,
> hence we disable preemption before doing deboost, and enabling it

s/enabling/re-enable/

> after the wake up is over.
>
> The patch fixed the logic, and introduced rt_mutex_postunlock()

s/The/This/
s/fixed/fixes/
s/introduced/introduces/

> to do some code refactor.
>
> Most importantly however; this change ensures pointer stability for
> the next patch, where we have rt_mutex_setprio() cache a pointer to
> the top-most waiter task. If we, as before this change, do the wakeup
> first and then deboost, this pointer might point into thin air.
>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Juri Lelli <juri.lelli@xxxxxxx>
> Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> [peterz: Changelog]
> Signed-off-by: Xunlei Pang <xlpang@xxxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Link: http://lkml.kernel.org/r/1461659449-19497-1-git-send-email-xlpang@xxxxxxxxxx

Do we have any specific tests for this set? I'm running mine.

Best,

- Juri

> ---
>
> kernel/futex.c | 5 ++---
> kernel/locking/rtmutex.c | 28 ++++++++++++++++++++++++----
> kernel/locking/rtmutex_common.h | 1 +
> 3 files changed, 27 insertions(+), 7 deletions(-)
>
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1336,9 +1336,8 @@ static int wake_futex_pi(u32 __user *uad
> * scheduled away before the wake up can take place.
> */
> spin_unlock(&hb->lock);
> - wake_up_q(&wake_q);
> - if (deboost)
> - rt_mutex_adjust_prio(current);
> +
> + rt_mutex_postunlock(&wake_q, deboost);
>
> return 0;
> }
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1390,12 +1390,32 @@ rt_mutex_fastunlock(struct rt_mutex *loc
> } else {
> bool deboost = slowfn(lock, &wake_q);
>
> - wake_up_q(&wake_q);
> + rt_mutex_postunlock(&wake_q, deboost);
> + }
> +}
> +
>
> - /* Undo pi boosting if necessary: */
> - if (deboost)
> - rt_mutex_adjust_prio(current);
> +/*
> + * Undo pi boosting (if necessary) and wake top waiter.
> + */
> +void rt_mutex_postunlock(struct wake_q_head *wake_q, bool deboost)
> +{
> + /*
> + * We should deboost before waking the top waiter task such that
> + * we don't run two tasks with the 'same' priority. This however
> + * can lead to prio-inversion if we would get preempted after
> + * the deboost but before waking our high-prio task, hence the
> + * preempt_disable.
> + */
> + if (deboost) {
> + preempt_disable();
> + rt_mutex_adjust_prio(current);
> }
> +
> + wake_up_q(wake_q);
> +
> + if (deboost)
> + preempt_enable();
> }
>
> /**
> --- a/kernel/locking/rtmutex_common.h
> +++ b/kernel/locking/rtmutex_common.h
> @@ -111,6 +111,7 @@ extern int rt_mutex_finish_proxy_lock(st
> extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
> extern bool rt_mutex_futex_unlock(struct rt_mutex *lock,
> struct wake_q_head *wqh);
> +extern void rt_mutex_postunlock(struct wake_q_head *wake_q, bool deboost);
> extern void rt_mutex_adjust_prio(struct task_struct *task);
>
> #ifdef CONFIG_DEBUG_RT_MUTEXES
>
>