Re: [PATCH RFC v5] cpufreq: schedutil: Make iowait boost more energy efficient
From: Juri Lelli
Date: Tue Jul 18 2017 - 10:03:04 EST
Hi,
On 18/07/17 11:15, Viresh Kumar wrote:
> On 17-07-17, 10:35, Joel Fernandes wrote:
> > On Mon, Jul 17, 2017 at 1:04 AM, Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
> > > On 16-07-17, 01:04, Joel Fernandes wrote:
>
> > >> + if (sg_cpu->iowait_boost_pending) {
> > >> + sg_cpu->iowait_boost_pending = false;
> > >> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> > >> + sg_cpu->iowait_boost_max);
> > >
> > > Now this has a problem. We will also boost after waiting for
>
> s/also/always/
>
> > > rate_limit_us. And that's why I had proposed the tricky solution in
> >
> > Not really unless rate_limit_us is < TICK_NSEC? Once TICK_NSEC
> > elapses, we would clear the boost in sugov_set_iowait_boost and in
> > sugov_next_freq_shared.
>
> You misread it and I know why it happened. And so I have sent a small
> patch to make it a bit more readable.
>
> rate_limit_us is associated with "last_freq_update_time", while
> iowait-boost is associated with "last_update".
>
> And last_update gets updated way too often.
>
> > > the first place. I thought we wanted to avoid instant boost only for
> > > the first iteration, but after that we wanted to do it ASAP. Isn't it?
> > >
> > > Now that you are using policy->min instead of policy->cur, we can
> > > simplify the solution I proposed and always do 2 * iowait_boost before
> >
> > No, doubling on the first boost was never discussed or intended in my
> > earlier patches. I thought even your patch never did, you were
> > dividing by 2, and then scaling it back up by 2 before consuming it to
> > preserve the initial boost.
> >
> > > getting current util/max in above if loop. i.e. we will start iowait
> > > boost with min * 2 instead of min and that should be fine.
> >
> > Hmm, but why start from double of min? Why not just min? It doesn't
> > make any difference to the intended behavior itself and is also
> > consistent with my proposal in RFC v4. Also I feel what you're
> > suggesting is more spike prone as well, the idea was to start from the
> > minimum and double it as we go, not to double the min the first go.
> > That was never intended.
> >
> > Also I would rather keep the "set and use and set and use" pattern to
> > keep the logic less confusing and clean IMO.
> > So we set initial boost in sugov_set_iowait_boost, and then in
> > sugov_iowait_boost we use it, and then set the boost for the next time
> > around at the end of sugov_iowait_boost (that is we double it). Next
> > time sugov_set_iowait_boost wouldn't touch the boost whether iowait
> > flag is set or not and we would continue into sugov_iowait_boost to
> > consume the boost. This would have a small delay in reducing the
> > boost, but that's Ok since its only one cycle of delay, and keeps the
> > code clean. I assume the last part is not an issue considering you're
> > proposing double of the initial boost anyway ;-)
>
> Okay, let me try to explain the problem first and then you can propose
> a solution if required.
>
> Expected Behavior:
>
> (Window refers to a time window of rate_limit_us here)
>
> A. The first window where IOWAIT flag is set, we set boost to min-freq
> and that shall be used for next freq update in
> sugov_iowait_boost(). Any more calls to sugov_set_iowait_boost()
> within this window shouldn't change the behavior.
>
> B. If the next window also has IOWAIT flag set, then
> sugov_iowait_boost() should use iowait*2 for freq update.
>
> C. If a window doesn't have IOWAIT flag set, then sugov_iowait_boost()
> should use iowait/2 in it.
>
>
> Do they look fine to you?
>
> Now coming to how will system behave with your patch:
>
> A. would be fine. We will follow things properly.
>
> But B. and C. aren't true anymore.
>
> This happened because after the first window we updated iowait_boost
> as 2*min unconditionally and the next window will *always* use that,
> even if the flag isn't set. And we may end up increasing the frequency
> unnecessarily, i.e. the spike where this discussion started.
>
Mmm, seems to make sense to me. :/
Would the following work (on top of Joel's v5)? Rationale being that
only in sugov_set_iowait_boost we might bump freq up (if no iowait_boost
was set) or start from policy->min. In sugov_iowait_boost (consumer)
instead we do the decay (if no boosting was pending).
---
kernel/sched/cpufreq_schedutil.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 46b2479641cc..b270563c15a5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -171,8 +171,14 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
{
if (flags & SCHED_CPUFREQ_IOWAIT) {
sg_cpu->iowait_boost_pending = true;
- sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
- sg_cpu->sg_policy->policy->min);
+ if (sg_cpu->iowait_boost) {
+ /* Bump up 2*current_boost until hitting max */
+ sg_cpu->iowait_boost = max(sg_cpu->iowait_boost << 1,
+ sg_cpu->iowait_boost_max);
+ } else {
+ /* Start from policy->min */
+ sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
+ }
} else if (sg_cpu->iowait_boost) {
s64 delta_ns = time - sg_cpu->last_update;
@@ -192,6 +198,17 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
if (!sg_cpu->iowait_boost)
return;
+ if (sg_cpu->iowait_boost_pending) {
+ /*
+ * Record consumption of current boost value
+ * (set by sugov_set_iowait_boost).
+ */
+ sg_cpu->iowait_boost_pending = false;
+ } else {
+ /* Decay boost */
+ sg_cpu->iowait_boost >>= 1;
+ }
+
boost_util = sg_cpu->iowait_boost;
boost_max = sg_cpu->iowait_boost_max;
@@ -199,14 +216,6 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
*util = boost_util;
*max = boost_max;
}
-
- if (sg_cpu->iowait_boost_pending) {
- sg_cpu->iowait_boost_pending = false;
- sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
- sg_cpu->iowait_boost_max);
- } else {
- sg_cpu->iowait_boost >>= 1;
- }
}
#ifdef CONFIG_NO_HZ_COMMON
--
2.11.0