Re: [PATCH v2] cpufreq: ACPI: add ITMT support when CPPC enabled
From: Rafael J. Wysocki
Date: Thu Aug 31 2023 - 09:03:16 EST
On Thu, Aug 31, 2023 at 12:19 PM Tony W Wang-oc
<TonyWWang-oc@xxxxxxxxxxx> wrote:
>
>
> On 8/23/23 04:01, Rafael J. Wysocki wrote:
> > On Tue, Aug 8, 2023 at 1:13 PM Tony W Wang-oc <TonyWWang-oc@xxxxxxxxxxx> wrote:
> >>
> >> The _CPC method can get per-core highest frequency.
> >
> > Well, not exactly. A more precise way to say this would be "The
> > per-core highest frequency can be obtained via CPPC."
> >
>
> Thanks for your reply, will rewrite the commit in next version.
>
> >> The highest frequency may varies between cores which mean cores can
> >
> > "may vary" and "which means"
> >
> >> running at different max frequency, so can use it as a core priority
> >
> > "can run", but it would be better to say "may run".
> >
> >> and give a hint to scheduler in order to put critical task to the
> >> higher priority core.
> >
> > Well, roughly speaking ...
> >
> > You should really talk about ITMT and how it can be hooked up to this.
> >
>
> Ok, Got it.
>
> >> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@xxxxxxxxxxx>
> >> ---
> >> v1->v2: Fix build errors reported by kernel test robot
> >>
> >> arch/x86/kernel/itmt.c | 2 ++
> >> drivers/cpufreq/acpi-cpufreq.c | 59 ++++++++++++++++++++++++++++++----
> >> 2 files changed, 54 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
> >> index ee4fe8cdb857..b49ac8ecbbd6 100644
> >> --- a/arch/x86/kernel/itmt.c
> >> +++ b/arch/x86/kernel/itmt.c
> >> @@ -122,6 +122,7 @@ int sched_set_itmt_support(void)
> >>
> >> return 0;
> >> }
> >> +EXPORT_SYMBOL_GPL(sched_set_itmt_support);
> >
> > This requires an ACK from the x86 maintainers.
> >
> >>
> >> /**
> >> * sched_clear_itmt_support() - Revoke platform's support of ITMT
> >> @@ -181,3 +182,4 @@ void sched_set_itmt_core_prio(int prio, int cpu)
> >> {
> >> per_cpu(sched_core_priority, cpu) = prio;
> >> }
> >> +EXPORT_SYMBOL_GPL(sched_set_itmt_core_prio);
> >
> > And same here.
> >
> >> diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> >> index b2f05d27167e..5733323e04ac 100644
> >> --- a/drivers/cpufreq/acpi-cpufreq.c
> >> +++ b/drivers/cpufreq/acpi-cpufreq.c
> >> @@ -628,28 +628,35 @@ static int acpi_cpufreq_blacklist(struct cpuinfo_x86 *c)
> >> #endif
> >>
> >> #ifdef CONFIG_ACPI_CPPC_LIB
> >> -static u64 get_max_boost_ratio(unsigned int cpu)
> >> +static void cpufreq_get_core_perf(int cpu, u64 *highest_perf, u64 *nominal_perf)
> >
> > This is not a cpufreq core function, so please use a different prefix
> > in its name.
> >
>
> Ok. Will remove the prefix of "cpufreq_".
>
> >> {
> >> struct cppc_perf_caps perf_caps;
> >> - u64 highest_perf, nominal_perf;
> >> int ret;
> >>
> >> if (acpi_pstate_strict)
> >> - return 0;
> >> + return;
> >>
> >> ret = cppc_get_perf_caps(cpu, &perf_caps);
> >> if (ret) {
> >> pr_debug("CPU%d: Unable to get performance capabilities (%d)\n",
> >> cpu, ret);
> >> - return 0;
> >> + return;
> >> }
> >>
> >> if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> >> - highest_perf = amd_get_highest_perf();
> >> + *highest_perf = amd_get_highest_perf();
> >> else
> >> - highest_perf = perf_caps.highest_perf;
> >> + *highest_perf = perf_caps.highest_perf;
> >> +
> >> + *nominal_perf = perf_caps.nominal_perf;
> >> + return;
> >> +}
> >>
> >> - nominal_perf = perf_caps.nominal_perf;
> >> +static u64 get_max_boost_ratio(unsigned int cpu)
> >> +{
> >> + u64 highest_perf, nominal_perf;
> >> +
> >> + cpufreq_get_core_perf(cpu, &highest_perf, &nominal_perf);
> >>
> >> if (!highest_perf || !nominal_perf) {
> >> pr_debug("CPU%d: highest or nominal performance missing\n", cpu);
> >> @@ -663,8 +670,44 @@ static u64 get_max_boost_ratio(unsigned int cpu)
> >>
> >> return div_u64(highest_perf << SCHED_CAPACITY_SHIFT, nominal_perf);
> >> }
> >> +
> >> +static void cpufreq_sched_itmt_work_fn(struct work_struct *work)
> >
> > A similar comment applies here.
> >
> >> +{
> >> + sched_set_itmt_support();
> >> +}
> >> +
> >> +static DECLARE_WORK(sched_itmt_work, cpufreq_sched_itmt_work_fn);
> >> +
> >> +static void cpufreq_set_itmt_prio(int cpu)
> >> +{
> >> + u64 highest_perf, nominal_perf;
> >> + static u32 max_highest_perf = 0, min_highest_perf = U32_MAX;
> >> +
> >> + cpufreq_get_core_perf(cpu, &highest_perf, &nominal_perf);
> >> +
> >> + sched_set_itmt_core_prio(highest_perf, cpu);
> >> +
> >> + if (max_highest_perf <= min_highest_perf) {
> >> + if (highest_perf > max_highest_perf)
> >> + max_highest_perf = highest_perf;
> >> +
> >> + if (highest_perf < min_highest_perf)
> >> + min_highest_perf = highest_perf;
> >> +
> >> + if (max_highest_perf > min_highest_perf) {
> >> + /*
> >> + * This code can be run during CPU online under the
> >> + * CPU hotplug locks, so sched_set_itmt_support()
> >> + * cannot be called from here. Queue up a work item
> >> + * to invoke it.
> >> + */
> >> + schedule_work(&sched_itmt_work);
> >> + }
> >
> > This potentially runs before ITMT priorities are set for all CPUs.
> > Isn't it a problem?
> >
>
> Yes, you are right.
> Will use schedule_delayed_work(&sched_itmt_work, msecs_to_jiffies(500))
> to fix this.
If the ordering matters, it is better to enforce it directly (through
an explicit code dependency, for example) than to rely on the timing
to do the right thing.
If you do the above, then it will not be clear why it is done (a
comment may help to address that, though) and why the delay is 500 us
in particular.