Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
From: Rafael J. Wysocki
Date: Wed Nov 18 2020 - 07:01:02 EST
On Wed, Nov 18, 2020 at 5:42 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>
> On 17-11-20, 14:06, Rafael J. Wysocki wrote:
> > Is this really a cpufreq thing, though, or is it arch stuff? I think
> > the latter, because it is not necessary for anything in cpufreq.
> >
> > Yes, acpi-cpufreq happens to know this information, because it uses
> > processor_perflib, but the latter may as well be used by the arch
> > enumeration of CPUs and the freqdomain_cpus mask may be populated from
> > there.
> >
> > As far as cpufreq is concerned, if the interface to the hardware is
> > per-CPU, there is one CPU per policy and cpufreq has no business
> > knowing anything about the underlying hardware coordination.
>
> It won't be used by cpufreq for now at least and yes I understand your
> concern. I opted for this because we already have a cpufreq
> implementation for the same thing and it is usually better to reuse
> this kind of stuff instead of inventing it over.
Do you mean related_cpus and real_cpus?
That's the granularity of the interface to the hardware I'm talking about.
Strictly speaking, it means "these CPUs share a HW interface for perf
control" and it need not mean "these CPUs are in the same
clock/voltage domain". Specifically, it need not mean "these CPUs are
the only CPUs in the given clock/voltage domain". That's what it
means when the control is exercised by manipulating OPPs directly, but
not in general.
In the ACPI case, for example, what the firmware tells you need not
reflect the HW topology in principle. It only tells you whether or
not it wants you to coordinate a given group of CPUs and in what way,
but this may not be the whole picture from the HW perspective. If you
need the latter, you need more information in general (at least you
need to assume that what the firmware tells you actually does reflect
the HW topology on the given SoC).
So yes, in the particular case of OPP-based perf control, cpufreq
happens to have the same information that is needed by the other
subsystems, but otherwise it may not and what I'm saying is that it
generally is a mistake to expect cpufreq to have that information or
to be able to obtain it without the help of the arch/platform code.
Hence, it would be a mistake to design an interface based on that
expectation.
Or looking at it from a different angle, today a cpufreq driver is
only required to specify the granularity of the HW interface for perf
control via related_cpus. It is not required to obtain extra
information beyond that. If a new mask to be populated by it is
added, the driver may need to do more work which is not necessary from
the perf control perspective. That doesn't look particularly clean to
me.
Moreover, adding such a mask to cpufreq_policy would make the users of
it depend on cpufreq sort of artificially, which need not be useful
even.
IMO, the information needed by all of the subsystems in question
should be obtained and made available at the arch/platform level and
everyone who needs it should be able to access it from there,
including the cpufreq driver for the given platform if that's what it
needs to do.
BTW, cpuidle may need the information in question too, so why should
it be provided via cpufreq rather than via cpuidle?