Re: [PATCH v2 2/2] [RFC] CPUFreq: Add support for cpu-perf-dependencies
From: Rafael J. Wysocki
Date: Thu Oct 15 2020 - 11:57:13 EST
On Tue, Oct 13, 2020 at 2:39 PM Ionela Voinescu <ionela.voinescu@xxxxxxx> wrote:
>
> Hi Rafael,
>
> On Tuesday 13 Oct 2020 at 13:53:37 (+0200), Rafael J. Wysocki wrote:
> > On Tue, Oct 13, 2020 at 12:01 AM Ionela Voinescu
> > <ionela.voinescu@xxxxxxx> wrote:
> > >
> > > Hey Lukasz,
> > >
> > > I think after all this discussion (in our own way of describing things)
> > > we agree on how the current cpufreq based FIE implementation is affected
> > > in systems that use hardware coordination.
> > >
> > > What we don't agree on is the location where that implementation (that
> > > uses the new mask and aggregation) should be.
> > >
> > > On Monday 12 Oct 2020 at 19:19:29 (+0100), Lukasz Luba wrote:
> > > [..]
> > > > The previous FIE implementation where arch_set_freq_scale()
> > > > was called from the drivers, was better suited for this issue.
> > > > Driver could just use internal dependency cpumask or even
> > > > do the aggregation to figure out the max freq for cluster
> > > > if there is a need, before calling arch_set_freq_scale().
> > > >
> > > > It is not perfect solution for software FIE, but one of possible
> > > > when there is no hw counters.
> > > >
> > > [..]
> > >
> > > > Difference between new FIE and old FIE (from v5.8) is that the new one
> > > > purely relies on schedutil max freq value (which will now be missing),
> > > > while the old FIE was called by the driver and thus it was an option to
> > > > fix only the affected cpufreq driver [1][2].
> > > >
> > >
> > > My final argument is that now you have 2 drivers that would need this
> > > support, next you'll have 3 (the new mediatek driver), and in the future
> > > there will be more. So why limit and duplicate this functionality in the
> > > drivers? Why not make it generic for all drivers to use if the system
> > > is using hardware coordination?
> > >
> > > Additionally, I don't think drivers should not even need to know about
> > > these dependency/clock domains. They should act at the level of the
> > > policy, which in this case will be at the level of each CPU.
> >
> > The policies come from the driver, though.
> >
> > The driver decides how many CPUs will be there in a policy and how to
> > handle them at the initialization time.
>
> Yes, policies are built based on information populated from the drivers
> at .init(): what CPUs will belong to a policy, what methods to use for
> setting and getting frequency, etc.
>
> So they do pass this information to the cpufreq core to be stored at the
> level of the policy, but later drivers (in the majority of cases) will
> not need to store on their own information on what CPUs belong to a
> frequency domain, they rely on having passed that information to the
> core, and the core mechanisms hold this information on the clock domains
> (currently through policy->cpus and policy->related_cpus).
Strictly speaking, not quite.
In fact policy->related_cpus is a set of CPUs that share a common perf
control HW/FW interface which may or may not match the boundaries of
clock domains etc. That's what the entire cpufreq needs to know and
cares about.
AFAICS your scale invariance rework patches were based on the
assumption that CPUs sharing an interface like that should also belong
to the same frequency domain, which is reasonable and that's why I
didn't have a problem with it, but if you were really assuming that
policy->related_cpus must always reflect a frequency domain, then I'm
afraid that you were not going in the right direction (the
one-CPU-per-policy with HW coordination example should be sufficient
to illustrate that).
It is correct that drivers generally don't need to know about the HW
clock (or voltage for that matter) coordination dependencies, but the
rest of cpufreq doesn't need to know about them either. If that
information is needed for something else, I don't see a reason to put
it into cpufreq.