Re: [PATCH v4 3/4] scmi-cpufreq: get opp_shared_cpus from opp-v2 for EM

From: Sudeep Holla
Date: Tue Dec 08 2020 - 06:21:06 EST


On Tue, Dec 08, 2020 at 12:56:11PM +0530, Viresh Kumar wrote:
> On 08-12-20, 07:22, Nicola Mazzucato wrote:
> > On 12/8/20 5:50 AM, Viresh Kumar wrote:
> > > On 02-12-20, 17:23, Nicola Mazzucato wrote:
> > >> nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> > >> if (nr_opp <= 0) {
> > >> - dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n");
> > >> - ret = -EPROBE_DEFER;
> > >> - goto out_free_opp;
> > >> + ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
> > >> + if (ret) {
> > >> + dev_warn(cpu_dev, "failed to add opps to the device\n");
> > >> + goto out_free_cpumask;
> > >> + }
> > >> +
> > >> + ret = dev_pm_opp_set_sharing_cpus(cpu_dev, opp_shared_cpus);
> > >> + if (ret) {
> > >> + dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
> > >> + __func__, ret);
> > >> + goto out_free_cpumask;
> > >> + }
> > >> +
> > >
> > > Why do we need to call above two after calling
> > > dev_pm_opp_get_opp_count() ?
> >
> > Sorry, I am not sure to understand your question here. If there are no opps for
> > a device we want to add them to it
>
> Earlier we used to call handle->perf_ops->device_opps_add() and
> dev_pm_opp_set_sharing_cpus() before calling dev_pm_opp_get_opp_count(), why is
> the order changed now ?
>
>
> I am not sure why they would be duplicated in your case. I though
> device_opps_add() is responsible for dynamically adding the OPPs here.
>

It is because of per-CPU vs per domain drama here. Imagine a system with
4 CPUs which the firmware puts in individual domains while they all are
in the same perf domain and hence OPP is marked shared in DT.

Since this probe gets called for all the cpus, we need to skip adding
OPPs for the last 3(add only for 1st one and mark others as shared).
If we attempt to add OPPs on second cpu probe, it *will* shout as duplicate
OPP as we would have already marked it as shared table with the first cpu.
Am I missing anything ? I suggested this as Nicola saw OPP duplicate
warnings when he was hacking up this patch.

> > otherwise no need as they would be duplicated.
> > > And we don't check the return value of
> > > the below call anymore, moreover we have to call it twice now.

Yes, that looks wrong, we need to add the check for non zero values, but ....

> >
> > This second get_opp_count is required such that we register em with the correct
> > opp number after having added them. Without this the opp_count would not be correct.
>

... I have a question here. Why do you need to call

em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, opp_shared_cpus..)

on each CPU ? Why can't that be done once for unique opp_shared_cpus ?

The whole drama of per-CPU vs perf domain is to have energy model and
if feeding it opp_shared_cpus once is not sufficient, then something is
wrong or simply duplicated or just not necessary IMO.

> What if the count is still 0 ? What about deferred probe we were doing earlier ?

OK, you made me think with that question. I think the check was original
added for deferred probe but then scmi core was changed to add the cpufreq
device only after everything needed is ready. So the condition must never
occur now.

--
Regards,
Sudeep