Re: [GIT PULL] Scheduler changes for v6.8

From: Vincent Guittot
Date: Sun Jan 14 2024 - 08:03:40 EST


On Sun, 14 Jan 2024 at 13:38, Wyes Karny <wkarny@xxxxxxxxx> wrote:
>
> On Sun, Jan 14, 2024 at 12:18:06PM +0100, Vincent Guittot wrote:
> > Hi Wyes,
> >
> > Le dimanche 14 janv. 2024 à 14:42:40 (+0530), Wyes Karny a écrit :
> > > On Wed, Jan 10, 2024 at 02:57:14PM -0800, Linus Torvalds wrote:
> > > > On Wed, 10 Jan 2024 at 14:41, Linus Torvalds
> > > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > It's one of these two:
> > > > >
> > > > > f12560779f9d sched/cpufreq: Rework iowait boost
> > > > > 9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation
> > > > >
> > > > > one more boot to go, then I'll try to revert whichever causes my
> > > > > machine to perform horribly much worse.
> > > >
> > > > I guess it should come as no surprise that the result is
> > > >
> > > > 9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d is the first bad commit
> > > >
> > > > but to revert cleanly I will have to revert all of
> > > >
> > > > b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> > > > f12560779f9d ("sched/cpufreq: Rework iowait boost")
> > > > 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> > > > performance estimation")
> > > >
> > > > This is on a 32-core (64-thread) AMD Ryzen Threadripper 3970X, fwiw.
> > > >
> > > > I'll keep that revert in my private test-tree for now (so that I have
> > > > a working machine again), but I'll move it to my main branch soon
> > > > unless somebody has a quick fix for this problem.
> > >
> > > Hi Linus,
> > >
> > > I'm able to reproduce this issue with my AMD Ryzen 5600G system. But
> > > only if I disable CPPC in BIOS and boot with acpi-cpufreq + schedutil.
> > > (I believe for your case also CPPC is diabled as log "_CPC object is not
> > > present" came). Enabling CPPC in BIOS issue not seen in my system. For
> > > AMD acpi-cpufreq also uses _CPC object to determine the boost ratio.
> > > When CPPC is disabled in BIOS something is going wrong and max
> > > capacity is becoming zero.
> > >
> > > Hi Vincent, Qais,
> > >

..

> >
> > There is something strange that I don't understand
> >
> > Could you trace on the return of sugov_get_util()
> > the value of sg_cpu->util ?
>
> Yeah, correct something was wrong in the bpftrace readings, max_cap is
> not zero in traces.
>
> git-5511 [001] d.h1. 427.159763: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> git-5511 [001] d.h1. 427.163733: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> git-5511 [001] d.h1. 427.163735: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> git-5511 [001] d.h1. 427.167706: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> git-5511 [001] d.h1. 427.167708: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> git-5511 [001] d.h1. 427.171678: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> git-5511 [001] d.h1. 427.171679: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> git-5511 [001] d.h1. 427.175653: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> git-5511 [001] d.h1. 427.175655: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> git-5511 [001] d.s1. 427.175665: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> git-5511 [001] d.s1. 427.175665: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
>
> Debug patch applied:
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 95c3c097083e..5c9b3e1de7a0 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -166,6 +166,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>
> freq = get_capacity_ref_freq(policy);
> freq = map_util_freq(util, freq, max);
> + trace_printk("[DEBUG] : freq %llu, util %llu, max %llu\n", freq, util, max);
>
> if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> return sg_policy->next_freq;
> @@ -199,6 +200,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
> util = max(util, boost);
> sg_cpu->bw_min = min;
> sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max);
> + trace_printk("[DEBUG] : util %llu, sg_cpu->util %llu\n", util, sg_cpu->util);
> }
>
> /**
>
>
> So, I guess map_util_freq going wrong somewhere.

Thanks for the trace. It was really helpful and I think that I got the
root cause.

The problem comes from get_capacity_ref_freq() which returns current
freq when arch_scale_freq_invariant() is not enable, and the fact that
we apply map_util_perf() earlier in the path now which is then capped
by max capacity.

Could you try the below ?

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e420e2ee1a10..611c621543f4 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -133,7 +133,7 @@ unsigned long get_capacity_ref_freq(struct
cpufreq_policy *policy)
if (arch_scale_freq_invariant())
return policy->cpuinfo.max_freq;

- return policy->cur;
+ return policy->cur + policy->cur >> 2;
}

/**



>
> Thanks,
> Wyes
> >
> > Thanks for you help
> > Vincent
> >
> > >
> > > Thanks,
> > > Wyes
> > >