Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue

From: Vincent Guittot
Date: Thu Aug 22 2024 - 04:19:53 EST

Next message: Yu Liao: "Re: [PATCH mm-unstable v2 1/3] mm/contig_alloc: support __GFP_COMP"
Previous message: Kirill A. Shutemov: "Re: [PATCH v4 4/6] x86/tdx: Add a restriction on access to MMIO address"
In reply to: Hongyan Xia: "Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue"
Next in thread: Vincent Guittot: "Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On Wed, 21 Aug 2024 at 15:34, Hongyan Xia <hongyan.xia2@xxxxxxx> wrote:
>
> Hi Peter,
>
> Sorry for bombarding this thread in the last couple of days. I'm seeing
> several issues in the latest tip/sched/core after these patches landed.
>
> What I'm now seeing seems to be an unbalanced call of util_est. First, I applied

I also see a remaining util_est for idle rq because of an unbalance
call of util_est_enqueue|dequeue

> the following diff to warn against util_est != 0 when no tasks are on
> the queue:
>
> https://lore.kernel.org/all/752ae417c02b9277ca3ec18893747c54dd5f277f.1724245193.git.hongyan.xia2@xxxxxxx/
>
> Then, I'm reliably seeing warnings on my Juno board during boot in
> latest tip/sched/core.
>
> If I do the same thing to util_est just like what you did in this uclamp
> patch, like this:

I think that the solution is simpler than your proposal and we just
need to always call util_est_enqueue() before the
requeue_delayed_entity

@@ -6970,11 +6970,6 @@ enqueue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
int rq_h_nr_running = rq->cfs.h_nr_running;
u64 slice = 0;

- if (flags & ENQUEUE_DELAYED) {
- requeue_delayed_entity(se);
- return;
- }
-
/*
* The code below (indirectly) updates schedutil which looks at
* the cfs_rq utilization to select a frequency.
@@ -6983,6 +6978,11 @@ enqueue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
*/
util_est_enqueue(&rq->cfs, p);

+ if (flags & ENQUEUE_DELAYED) {
+ requeue_delayed_entity(se);
+ return;
+ }
+
/*
* If in_iowait is set, the code below may not trigger any cpufreq
* utilization updates, so do it here explicitly with the IOWAIT flag

>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 574ef19df64b..58aac42c99e5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6946,7 +6946,7 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
>
> if (flags & ENQUEUE_DELAYED) {
> requeue_delayed_entity(se);
> - return;
> + goto util_est;
> }
>
> /*
> @@ -6955,7 +6955,6 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
> * Let's add the task's estimated utilization to the cfs_rq's
> * estimated utilization, before we update schedutil.
> */
> - util_est_enqueue(&rq->cfs, p);
>
> /*
> * If in_iowait is set, the code below may not trigger any cpufreq
> @@ -7050,6 +7049,9 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
> assert_list_leaf_cfs_rq(rq);
>
> hrtick_update(rq);
> +util_est:
> + if (!p->se.sched_delayed)
> + util_est_enqueue(&rq->cfs, p);
> }
>
> static void set_next_buddy(struct sched_entity *se);
> @@ -7173,7 +7175,8 @@ static int dequeue_entities(struct rq *rq, struct
> sched_entity *se, int flags)
> */
> static bool dequeue_task_fair(struct rq *rq, struct task_struct *p,
> int flags)
> {
> - util_est_dequeue(&rq->cfs, p);
> + if (!p->se.sched_delayed)
> + util_est_dequeue(&rq->cfs, p);
>
> if (dequeue_entities(rq, &p->se, flags) < 0) {
> if (!rq->cfs.h_nr_running)
>
> which is basically enqueuing util_est after enqueue_task_fair(),
> dequeuing util_est before dequeue_task_fair() and double check
> p->se.delayed_dequeue, then the unbalanced issue seems to go away.
>
> Hopefully this helps you in finding where the problem could be.
>
> Hongyan
>
> On 27/07/2024 11:27, Peter Zijlstra wrote:
> > Delayed dequeue has tasks sit around on the runqueue that are not
> > actually runnable -- specifically, they will be dequeued the moment
> > they get picked.
> >
> > One side-effect is that such a task can get migrated, which leads to a
> > 'nested' dequeue_task() scenario that messes up uclamp if we don't
> > take care.
> >
> > Notably, dequeue_task(DEQUEUE_SLEEP) can 'fail' and keep the task on
> > the runqueue. This however will have removed the task from uclamp --
> > per uclamp_rq_dec() in dequeue_task(). So far so good.
> >
> > However, if at that point the task gets migrated -- or nice adjusted
> > or any of a myriad of operations that does a dequeue-enqueue cycle --
> > we'll pass through dequeue_task()/enqueue_task() again. Without
> > modification this will lead to a double decrement for uclamp, which is
> > wrong.
> >
> > Reported-by: Luis Machado <luis.machado@xxxxxxx>
> > Reported-by: Hongyan Xia <hongyan.xia2@xxxxxxx>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > kernel/sched/core.c | 16 +++++++++++++++-
> > 1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1676,6 +1676,9 @@ static inline void uclamp_rq_inc(struct
> > if (unlikely(!p->sched_class->uclamp_enabled))
> > return;
> >
> > + if (p->se.sched_delayed)
> > + return;
> > +
> > for_each_clamp_id(clamp_id)
> > uclamp_rq_inc_id(rq, p, clamp_id);
> >
> > @@ -1700,6 +1703,9 @@ static inline void uclamp_rq_dec(struct
> > if (unlikely(!p->sched_class->uclamp_enabled))
> > return;
> >
> > + if (p->se.sched_delayed)
> > + return;
> > +
> > for_each_clamp_id(clamp_id)
> > uclamp_rq_dec_id(rq, p, clamp_id);
> > }
> > @@ -1979,8 +1985,12 @@ void enqueue_task(struct rq *rq, struct
> > psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
> > }
> >
> > - uclamp_rq_inc(rq, p);
> > p->sched_class->enqueue_task(rq, p, flags);
> > + /*
> > + * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
> > + * ->sched_delayed.
> > + */
> > + uclamp_rq_inc(rq, p);
> >
> > if (sched_core_enabled(rq))
> > sched_core_enqueue(rq, p);
> > @@ -2002,6 +2012,10 @@ inline bool dequeue_task(struct rq *rq,
> > psi_dequeue(p, flags & DEQUEUE_SLEEP);
> > }
> >
> > + /*
> > + * Must be before ->dequeue_task() because ->dequeue_task() can 'fail'
> > + * and mark the task ->sched_delayed.
> > + */
> > uclamp_rq_dec(rq, p);
> > return p->sched_class->dequeue_task(rq, p, flags);
> > }
> >
> >

Next message: Yu Liao: "Re: [PATCH mm-unstable v2 1/3] mm/contig_alloc: support __GFP_COMP"
Previous message: Kirill A. Shutemov: "Re: [PATCH v4 4/6] x86/tdx: Add a restriction on access to MMIO address"
In reply to: Hongyan Xia: "Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue"
Next in thread: Vincent Guittot: "Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]