Re: [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
From: Tim Chen
Date: Thu Sep 22 2016 - 14:50:41 EST
On Wed, 2016-09-21 at 22:30 +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 21, 2016 at 9:19 PM, Srinivas Pandruvada
> <srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
> >
> >Â
> > +
> > +static void intel_pstate_check_and_enable_itmt(int cpu)
> > +{
> > +ÂÂÂÂÂÂÂ/*
> > +ÂÂÂÂÂÂÂÂ* For checking whether there is any difference in the maximum
> > +ÂÂÂÂÂÂÂÂ* performance for each CPU, need to wait till we have CPPC
> > +ÂÂÂÂÂÂÂÂ* data from all CPUs called from the cpufreq core. If there is a
> > +ÂÂÂÂÂÂÂÂ* difference in the maximum performance, then we have ITMT support.
> > +ÂÂÂÂÂÂÂÂ* If ITMT is supported, update the scheduler core priority for each
> > +ÂÂÂÂÂÂÂÂ* CPU and call to enable the ITMT feature.
> > +ÂÂÂÂÂÂÂÂ*/
> > +ÂÂÂÂÂÂÂif (cpumask_subset(topology_core_cpumask(cpu), &cppc_read_cpu_mask)) {
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂint cpu_index;
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂint max_prio;
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂstruct cpudata *cpu;
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbool itmt_support = false;
> > +
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcpu = all_cpu_data[cpumask_first(&cppc_read_cpu_mask)];
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂmax_prio = cpu->cppc_perf->highest_perf;
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂfor_each_cpu(cpu_index, &cppc_read_cpu_mask) {
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcpu = all_cpu_data[cpu_index];
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (max_prio != cpu->cppc_perf->highest_perf) {
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂitmt_support = true;
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ}
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ}
> > +
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (!itmt_support)
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn;
> > +
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂfor_each_cpu(cpu_index, &cppc_read_cpu_mask) {
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcpu = all_cpu_data[cpu_index];
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂsched_set_itmt_core_prio(cpu->cppc_perf->highest_perf,
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcpu_index);
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ}
> My current understanding is that we need to rebuild sched domains
> after setting the priorities,
No, that's not true. ÂWe need to rebuild the sched domains only
when the sched domain flags are changed, not when we are changing
the priorities. ÂOnly the sched domain flag is a property of
the sched domain. CPU priority values are not part of sched domain.
Morten had similar question about whether we need to rebuild sched domain
when we change cpu priorities when we first post the patches.Â
Peter has explained that it wasn't necessary.
http://lkml.iu.edu/hypermail/linux/kernel/1608.3/01753.html
> so what if there are two CPU packages
> and there are highest_perf differences in both, and we first enumerate
> the first package entirely before getting to the second one?
>
> In that case we'll schedule the work item after enumerating the first
> package and it may rebuild the sched domains before all priorities are
> set for the second package, may it not?
That is not a problem. ÂFor the second package, all the cpu priorities
are initialized to the same value. ÂSo even if we start to doÂ
asym_packing in the scheduler for the whole system,Â
on the second package, all the cpus are treated equally by the scheduler.
We will operate as if there is no favored core till we update the
priorities of the cpu on the second package.
That said, we don't enable ITMT automatically for 2 package system.
So the explicit sysctl command to enable ITMT and cause the sched domain
rebuild for 2 package system is most likely to come after
we have discovered and set all the cpu priorities.
>
> This seems to require some more consideration.
>
> >
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/*
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ* Since this function is in the hotcpu notifier callback
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ* path, submit a task to workqueue to call
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ* sched_set_itmt_support().
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ*/
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂschedule_work(&sched_itmt_work);
> It doesn't make sense to do this more than once IMO and what if we
> attempt to schedule the work item again when it has been scheduled
> once already?ÂÂDon't we need any protection here?
It is not a problem for sched_set_itmt_support to be called more than
once.
First, we will ignore the second call ifÂsched_itmt_capable has already
been set to the same value in the previous sched_set_itmt_support call.
Secondly, the call to update sched_itmt_capable
is protected by the itmt_update_mutex.
Thanks.
Tim