Re: [PATCH] cpufreq: schedutil: Don't skip freq update when limits change

From: Viresh Kumar
Date: Thu Aug 01 2019 - 03:55:59 EST


On 01-08-19, 09:47, Rafael J. Wysocki wrote:
> On Thu, Aug 1, 2019 at 8:17 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
> >
> > On 31-07-19, 17:20, Doug Smythies wrote:
> > > Hi Viresh,
> > >
> > > Summary:
> > >
> > > The old way, using UINT_MAX had two purposes: first,
> > > as a "need to do a frequency update" flag; but also second, to
> > > force any subsequent old/new frequency comparison to NOT be "the same,
> > > so why bother actually updating" (see: sugov_update_next_freq). All
> > > patches so far have been dealing with the flag, but only partially
> > > the comparisons. In a busy system, and when schedutil.c doesn't actually
> > > know the currently set system limits, the new frequency is dominated by
> > > values the same as the old frequency. So, when sugov_fast_switch calls
> > > sugov_update_next_freq, false is usually returned.
> >
> > And finally we know "Why" :)
> >
> > Good work Doug. Thanks for taking it to the end.
> >
> > > However, if we move the resetting of the flag and add another condition
> > > to the "no need to actually update" decision, then perhaps this patch
> > > version 1 will be O.K. It seems to be. (see way later in this e-mail).
> >
> > > With all this new knowledge, how about going back to
> > > version 1 of this patch, and then adding this:
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 808d32b..f9156db 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -100,7 +100,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > > static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > unsigned int next_freq)
> > > {
> > > - if (sg_policy->next_freq == next_freq)
> > > + /*
> > > + * Always force an update if the flag is set, regardless.
> > > + * In some implementations (intel_cpufreq) the frequency is clamped
> > > + * further downstream, and might not actually be different here.
> > > + */
> > > + if (sg_policy->next_freq == next_freq && !sg_policy->need_freq_update)
> > > return false;
> >
> > This is not correct because this is an optimization we have in place
> > to make things more efficient. And it was working by luck earlier and
> > my patch broke it for good :)
>
> OK, so since we know why it was wrong now, why don't we just revert
> it? Plus maybe add some comment explaining the rationale in there?

Because the patch [1] which caused these issues was almost correct,
just that it missed the busy accounting for single CPU case.

The main idea behind the original patch [1] was to avoid any
unwanted/hidden side-affects by overriding the value of next_freq.
What we see above is exactly the case for that. Because we override
the value of next_freq, we made intel-pstate work by chance,
unintentionally. Which is wrong. And who knows what other side affects
it had, we already found two (this one and the one fixed by [1]).

I would strongly suggest that we don't override the value of next_freq
with special meaning, as it is used at so many places we don't know
what it may result in.

--
viresh

[1] ecd288429126 cpufreq: schedutil: Don't set next_freq to UINT_MAX