Re: [PATCH v2 1/3] sched: Fix UCLAMP_FLAG_IDLE setting

From: Quentin Perret
Date: Mon Jun 21 2021 - 06:57:26 EST


Hi Dietmar,

On Thursday 17 Jun 2021 at 17:27:56 (+0200), Dietmar Eggemann wrote:
> On 11/06/2021 09:25, Quentin Perret wrote:
> > On Thursday 10 Jun 2021 at 21:05:12 (+0200), Peter Zijlstra wrote:
> >> On Thu, Jun 10, 2021 at 03:13:04PM +0000, Quentin Perret wrote:
> >>> The UCLAMP_FLAG_IDLE flag is set on a runqueue when dequeueing the last
> >>> active task to maintain the last uclamp.max and prevent blocked util
> >>> from suddenly becoming visible.
> >>>
> >>> However, there is an asymmetry in how the flag is set and cleared which
> >>> can lead to having the flag set whilst there are active tasks on the rq.
> >>> Specifically, the flag is cleared in the uclamp_rq_inc() path, which is
> >>> called at enqueue time, but set in uclamp_rq_dec_id() which is called
> >>> both when dequeueing a task _and_ in the update_uclamp_active() path. As
> >>> a result, when both uclamp_rq_{dec,ind}_id() are called from
> >>> update_uclamp_active(), the flag ends up being set but not cleared,
> >>> hence leaving the runqueue in a broken state.
> >>>
> >>> Fix this by setting the flag in the uclamp_rq_inc_id() path to ensure
> >>> things remain symmetrical.
> >>
> >> The code you moved is neither in uclamp_rq_inc_id(), although
> >> uclamp_idle_reset() is called from there
> >
> > Yep, that is what I was trying to say.
> >
> >> nor does it _set_ the flag.
> >
> > Ahem. That I don't have a good excuse for ...
>
> (A) dequeue -> set
>
> (1) dequeue_task() -> uclamp_rq_dec() ->
>
> (2) cpu_util_update_eff() -> ... -> uclamp_update_active() ->
>
> uclamp_rq_dec_id()
>
> uclamp_rq_max_value()
>
> /* No tasks -- default clamp values */
> uclamp_idle_value() {
>
> if (clamp_id == UCLAMP_MAX)
> rq->uclamp_flags |= UCLAMP_FLAG_IDLE; <-- set
> }
>
> ---
>
> (B) enqueue -> clear
>
> (1) enqueue_task() ->
>
> uclamp_rq_inc() {
>
> (2) cpu_util_update_eff() -> ... -> uclamp_update_active() ->
>
> uclamp_rq_inc_id() {
>
> uclamp_idle_reset() {
> <-- new clear
> } ^
> } |
> |
> if (rq->uclamp_flags & UCLAMP_FLAG_IDLE) |
> rq->uclamp_flags &= ~UCLAMP_FLAG_IDLE; <-- old clear
> }
>
> ---
>
> uclamp_update_active()
>
> if (p->uclamp[clamp_id].active) {
> uclamp_rq_dec_id() <-- (A2)
> uclamp_rq_inc_id() <-- (B2)
> }
>
> Is this existing asymmetry in setting the flag but not clearing it in
> uclamp_update_active() the only issue this patch fixes?

I think this is the root of the problem, but it can have odd symptoms.
In a bad case that can lead to hitting the WARN in uclamp_rq_dec_id
(which is how we've found the bug in the first place).

I'll try and repost this with a correct commit message soon -- still
fighting with my inbox right now.

Thanks,
Quentin