Re: [PATCH v7 02/37] soc/tegra: pmc: Implement attach_dev() of power domain drivers

From: Ulf Hansson
Date: Tue Aug 10 2021 - 06:52:43 EST


On Tue, 10 Aug 2021 at 01:56, Dmitry Osipenko <digetx@xxxxxxxxx> wrote:
>
> 09.08.2021 17:15, Ulf Hansson пишет:
> >> We did that in a previous versions of this series where drivers were
> >> calling devm_tegra_core_dev_init_opp_table() helper during the probe to
> >> initialize performance state of the domain. Moving OPP state
> >> initialization into central place made drivers cleaner by removing the
> >> boilerplate code.
> > I am not against doing this in a central place, like $subject patch
> > suggests. As a matter of fact, it makes perfect sense to me.
> >
> > However, what I am concerned about, is that you require to use genpd
> > internal data structures to do it. I think we should try to avoid
> > that.
>
> Alright, what do you think about this:
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index a934c679e6ce..5faed62075e9 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -2669,12 +2669,37 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> dev->pm_domain->detach = genpd_dev_pm_detach;
> dev->pm_domain->sync = genpd_dev_pm_sync;
>
> + if (pd->default_performance_state) {
> + unsigned int default_pstate;
> +
> + ret = pd->default_performance_state(pd, dev);
> + if (ret < 0) {
> + dev_err(dev, "failed to get default performance state for PM domain %s: %d\n",
> + pd->name, ret);
> + goto out;
> + }

Adding a new callback seems reasonable to support this.

> +
> + default_pstate = ret;
> +
> + if (power_on) {
> + ret = dev_pm_genpd_set_performance_state(dev, default_pstate);

However, this is more questionable to me.

First, I don't think we should care about whether this is "power_on"
or not. At this point, performance states are treated orthogonal to
idle states in genpd. We may decide to change that in some way, but
that deserves a different change.

Second, I don't think we should call
dev_pm_genpd_set_performance_state() from here. It's probably better
handled from the genpd callback itself, if/when needed.

That said, perhaps the new callback should just return a regular error
code and zero on success, rather than the current performance state.
See more below.

> + if (ret) {
> + dev_err(dev, "failed to set default performance state %u for PM domain %s: %d\n",
> + default_pstate, pd->name, ret);
> + goto out;
> + }
> + } else {
> + dev_gpd_data(dev)->rpm_pstate = default_pstate;

No, this isn't the right thing to do.

It looks like you are trying to use the ->rpm_pstate for
synchronization with runtime PM for consumer drivers. This is fragile
as it depends on the runtime PM deployment in the consumer driver. I
think you should look at ->rpm_pstate as a variable solely for
managing save/restore of the performance state for the device, during
runtime suspend/resume in genpd.

Synchronization of a vote for a performance state for a device, needs
to be managed by calling dev_pm_genpd_set_performance_state() - or by
calling an OPP function that calls it, like dev_pm_opp_set_rate(), for
example.

> + }
> + }
> +
> if (power_on) {
> genpd_lock(pd);
> ret = genpd_power_on(pd, 0);
> genpd_unlock(pd);
> }
>
> +out:
> if (ret)
> genpd_remove_device(pd, dev);
>
> diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
> index 81d1f019fa0c..9efb55f52462 100644
> --- a/drivers/soc/tegra/pmc.c
> +++ b/drivers/soc/tegra/pmc.c
> @@ -518,15 +518,14 @@ static const char * const tegra_emc_compats[] = {
> * We retrieve clock rate of the attached device and initialize domain's
> * performance state in accordance to the clock rate.
> */
> -static int tegra_pmc_pd_attach_dev(struct generic_pm_domain *genpd,
> - struct device *dev)
> +static int tegra_pmc_genpd_default_perf_state(struct generic_pm_domain *genpd,
> + struct device *dev)
> {
> - struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev);
> struct opp_table *opp_table, *pd_opp_table;
> struct generic_pm_domain *core_genpd;
> struct dev_pm_opp *opp, *pd_opp;
> - unsigned long rate, state;
> struct gpd_link *link;
> + unsigned long rate;
> struct clk *clk;
> u32 hw_version;
> int ret;
> @@ -633,8 +632,7 @@ static int tegra_pmc_pd_attach_dev(struct generic_pm_domain *genpd,
> * RPM-resume of the device. This means that drivers don't need to
> * explicitly initialize performance state.
> */
> - state = pm_genpd_opp_to_performance_state(&core_genpd->dev, pd_opp);
> - gpd_data->rpm_pstate = state;
> + ret = pm_genpd_opp_to_performance_state(&core_genpd->dev, pd_opp);

I don't see how this avoids tegra_pmc_genpd_default_perf_state() from
having to walk &genpd->child_links.

That's still an issue, right?

> dev_pm_opp_put(pd_opp);
>
> put_pd_opp_table:
> @@ -1383,7 +1381,7 @@ static int tegra_powergate_add(struct tegra_pmc *pmc, struct device_node *np)
>
> pg->id = id;
> pg->genpd.name = np->name;
> - pg->genpd.attach_dev = tegra_pmc_pd_attach_dev;
> + pg->genpd.default_performance_state = tegra_pmc_genpd_default_perf_state;
> pg->genpd.power_off = tegra_genpd_power_off;
> pg->genpd.power_on = tegra_genpd_power_on;
> pg->pmc = pmc;
> @@ -1500,7 +1498,7 @@ static int tegra_pmc_core_pd_add(struct tegra_pmc *pmc, struct device_node *np)
> return -ENOMEM;
>
> genpd->name = np->name;
> - genpd->attach_dev = tegra_pmc_pd_attach_dev;
> + genpd->default_performance_state = tegra_pmc_genpd_default_perf_state;
> genpd->set_performance_state = tegra_pmc_core_pd_set_performance_state;
> genpd->opp_to_performance_state = tegra_pmc_core_pd_opp_to_performance_state;
>
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index 21a0577305ef..cd4867817ca5 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -143,6 +143,8 @@ struct generic_pm_domain {
> struct device *dev);
> void (*detach_dev)(struct generic_pm_domain *domain,
> struct device *dev);
> + int (*default_performance_state)(struct generic_pm_domain *domain,
> + struct device *dev);
> unsigned int flags; /* Bit field of configs for genpd */
> struct genpd_power_state *states;
> void (*free_states)(struct genpd_power_state *states,
>
> >> I can revert back to the previous variant, although this variant works
> >> well too.
> > I looked at that code and in that path we end up calling
> > dev_pm_opp_set_rate(), after it has initialized the opp table for the
> > device.
> >
> > Rather than doing the OF parsing above to find out the current state
> > for the device, why can't you just call dev_pm_opp_set_rate() to
> > initialize a proper vote instead?
> >
>
> For some devices clock rate is either preset by bootloader, or by clk driver, or by assigned-clocks in a device-tree. And then I don't see what's the difference in comparison to initialization for the current rate.
>
> For some devices, like memory controller, we can't just change the clock rate because it's a complex procedure and some boards will use fixed rate, but the power vote still must be initialized.

I am not saying you should change the clock rate. The current code
path that runs via devm_tegra_core_dev_init_opp_table() just calls
clk_get_rate and then dev_pm_opp_set_rate() with the current rate to
vote for the corresponding OPP level. Right?

Isn't this exactly what you want? No?

Kind regards
Uffe