Re: [PATCH v6] sched: Consolidate cpufreq updates

From: Vincent Guittot
Date: Tue Jul 09 2024 - 03:49:55 EST


On Fri, 5 Jul 2024 at 13:50, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 05/07/2024 02:22, Qais Yousef wrote:
> > On 07/04/24 12:12, Dietmar Eggemann wrote:
> >> On 28/06/2024 03:52, Qais Yousef wrote:
> >>> On 06/25/24 14:58, Dietmar Eggemann wrote:
> >>>
> >>>>> @@ -4917,6 +4927,84 @@ static inline void __balance_callbacks(struct rq *rq)
> >>>>>
> >>>>> #endif
> >>>>>
> >>>>> +static __always_inline void
> >>>>> +__update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
> >>>>> +{
> >>>>> +#ifdef CONFIG_CPU_FREQ
> >>>>> + if (prev && prev->dl.flags & SCHED_FLAG_SUGOV) {
> >>>>> + /* Sugov just did an update, don't be too aggressive */
> >>>>> + return;
> >>>>> + }
> >>>>> +
> >>>>> + /*
> >>>>> + * RT and DL should always send a freq update. But we can do some
> >>>>> + * simple checks to avoid it when we know it's not necessary.
> >>>>> + *
> >>>>> + * iowait_boost will always trigger a freq update too.
> >>>>> + *
> >>>>> + * Fair tasks will only trigger an update if the root cfs_rq has
> >>>>> + * decayed.
> >>>>> + *
> >>>>> + * Everything else should do nothing.
> >>>>> + */
> >>>>> + switch (current->policy) {
> >>>>> + case SCHED_NORMAL:
> >>>>> + case SCHED_BATCH:
> >>>>
> >>>> What about SCHED_IDLE tasks?
> >>>
> >>> I didn't think they matter from cpufreq perspective. These tasks will just run
> >>> at whatever the idle system is happen to be at and have no specific perf
> >>> requirement since they should only run when the system is idle which a recipe
> >>> for starvation anyway?
> >>
> >> Not sure we talk about the same thing here? idle_sched_class vs.
> >> SCHED_IDLE policy (FAIR task with a tiny weight of WEIGHT_IDLEPRIO).
> >
> > Yes I am referring to SCHED_IDLE policy too. What is your expectation? AFAIK
> > the goal of this policy to run when there's nothing else needs running.
>
> IMHO, SCHED_IDLE tasks fight with all the other FAIR task over the
> resource rq. I would include SCHED_IDLE into this switch statement next
> to SCHED_NORMAL and SCHED_BATCH.
> What do you do if only SCHED_IDLE FAIR tasks are runnable? They probably
> also want to have their CPU frequency needs adjusted.

I agree SCHED_IDLE means do not preempt SCHED_NORMAL and SCHED_BATCH
but not do run at a random frequency

>
> [...]
>
> >>>>> @@ -4766,11 +4738,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >>>>> */
> >>>>> detach_entity_load_avg(cfs_rq, se);
> >>>>> update_tg_load_avg(cfs_rq);
> >>>>> - } else if (decayed) {
> >>>>> - cfs_rq_util_change(cfs_rq, 0);
> >>>>> -
> >>>>> - if (flags & UPDATE_TG)
> >>>>> - update_tg_load_avg(cfs_rq);
> >>>>> + } else if (cfs_rq->decayed && (flags & UPDATE_TG)) {
> >>>>> + update_tg_load_avg(cfs_rq);
> >>>>> }
> >>>>> }
> >>>>
> >>>> You set cfs_rq->decayed for each taskgroup level but you only reset it
> >>>> for the root cfs_rq in __update_cpufreq_ctx_switch() and task_tick_fair()?
> >>>
> >>> Yes. We only care about using it for root level. Tracking the information at
> >>> cfs_rq level is the most natural way to do it as this is what update_load_avg()
> >>> is acting on.
> >>
> >> But IMHO this creates an issue with those non-root cfs_rq's within
> >
> > I am not seeing the issue, could you expand on what is it?
>
> I tried to explained it in the 4 lines below. With a local 'decayed'
> update_cfs_rq_load_avg() and propagate_entity_load_avg() set it every
> time update_load_avg() gets called. And this then determines whether
> update_tg_load_avg() is called on this cfs_rq later in update_load_avg().
>
> The new code:
>
> cfs_rq->decayed |= update_cfs_rq_load_avg() (*)
> cfs_rq->decayed |= propagate_entity_load_avg()
>
> will not reset 'cfs_rq->decayed' for non-root cfs_rq's.
>
> (*) You changed this in v3 from:
>
> cfs_rq->decayed = update_cfs_rq_load_avg()
>
>
> >> update_load_avg() itself. They will stay decayed after cfs_rq->decayed
> >> has been set to 1 once and will never be reset to 0. So with UPDATE_TG
> >> update_tg_load_avg() will then always be called on those non-root
> >> cfs_rq's all the time.
> >
> > We could add a check to update only the root cfs_rq. But what do we gain? Or
> > IOW, what is the harm of unconditionally updating cfs_rq->decayed given that we
> > only care about the root cfs_rq? I see more if conditions and branches which
> > I am trying to avoid.
>
> Yes, keep 'decayed' local and add a:
>
> if (cfs_rq == &rq_of(cfs_rq)->cfs)
> cfs_rq->decayed = decayed
>
>
>
>
>