Re: [PATCH] cpufreq: schedutil: Don't skip freq update when limits change

From: Viresh Kumar
Date: Wed Jul 24 2019 - 07:43:34 EST


On 23-07-19, 12:27, Rafael J. Wysocki wrote:
> On Tue, Jul 23, 2019 at 11:15 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
> > Though there is one difference between intel_cpufreq and acpi_cpufreq,
> > intel_cpufreq has fast_switch_possible=true and so it uses slightly
> > different path in schedutil. I tried to look from that perspective as
> > well but couldn't find anything wrong.
>
> acpi-cpufreq should use fast switching on the Doug's system too.

Ah okay.

> > If you still find intel_cpufreq to be broken, even with this patch,
> > please set fast_switch_possible=false instead of true in
> > __intel_pstate_cpu_init() and try tests again. That shall make it very
> > much similar to acpi-cpufreq driver.
>
> I wonder if this helps. Even so, we want fast switching to be used by
> intel_cpufreq.

With both using fast switching it shouldn't make any difference.

> Anyway, it looks like the change reverted by the Doug's patch
> introduced a race condition that had not been present before. Namely,
> need_freq_update is cleared in get_next_freq() when it is set _or_
> when the new freq is different from the cached one, so in the latter
> case if it happens to be set by sugov_limits() after evaluating
> sugov_should_update_freq() (which returned 'true' for timing reasons),
> that update will be lost now. [Previously the update would not be
> lost, because the clearing of need_freq_update depended only on its
> current value.] Where it matters is that in the "need_freq_update set"
> case, the "premature frequency reduction avoidance" should not be
> applied (as you noticed and hence the $subject patch).
>
> However, even with the $subject patch, need_freq_update may still be
> set by sugov_limits() after the check added by it and then cleared by
> get_next_freq(), so it doesn't really eliminate the problem.
>
> IMO eliminating would require invalidating next_freq this way or
> another when need_freq_update is set in sugov_should_update_freq(),
> which was done before commit ecd2884291261e3fddbc7651ee11a20d596bb514.

Hmm, so to avoid locking in fast path we need two variable group to
protect against this kind of issues. I still don't want to override
next_freq with a special meaning as it can cause hidden bugs, we have
seen that earlier.

What about something like this then ?

--
viresh

-------------------------8<-------------------------
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 636ca6f88c8e..2f382b0959e5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -40,6 +40,7 @@ struct sugov_policy {
struct task_struct *thread;
bool work_in_progress;

+ bool limits_changed;
bool need_freq_update;
};

@@ -89,8 +90,11 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
!cpufreq_this_cpu_can_update(sg_policy->policy))
return false;

- if (unlikely(sg_policy->need_freq_update))
+ if (unlikely(sg_policy->limits_changed)) {
+ sg_policy->limits_changed = false;
+ sg_policy->need_freq_update = true;
return true;
+ }

delta_ns = time - sg_policy->last_freq_update_time;

@@ -437,7 +441,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy)
{
if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
- sg_policy->need_freq_update = true;
+ sg_policy->limits_changed = true;
}

static void sugov_update_single(struct update_util_data *hook, u64 time,
@@ -447,7 +451,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
unsigned long util, max;
unsigned int next_f;
- bool busy;
+ bool busy = false;

sugov_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
@@ -457,7 +461,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
if (!sugov_should_update_freq(sg_policy, time))
return;

- busy = sugov_cpu_is_busy(sg_cpu);
+ /* Limits may have changed, don't skip frequency update */
+ if (!sg_policy->need_freq_update)
+ busy = sugov_cpu_is_busy(sg_cpu);

util = sugov_get_util(sg_cpu);
max = sg_cpu->max;
@@ -831,6 +837,7 @@ static int sugov_start(struct cpufreq_policy *policy)
sg_policy->last_freq_update_time = 0;
sg_policy->next_freq = 0;
sg_policy->work_in_progress = false;
+ sg_policy->limits_changed = false;
sg_policy->need_freq_update = false;
sg_policy->cached_raw_freq = 0;

@@ -879,7 +886,7 @@ static void sugov_limits(struct cpufreq_policy *policy)
mutex_unlock(&sg_policy->work_lock);
}

- sg_policy->need_freq_update = true;
+ sg_policy->limits_changed = true;
}

struct cpufreq_governor schedutil_gov = {