Re: [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor

From: Chanwoo Choi
Date: Mon May 27 2019 - 21:28:58 EST


Hi Sibi,

On 19. 5. 27. ìí 5:23, Sibi Sankar wrote:
> Hey Chanwoo,
>
> Thanks a lot for reviewing the patch. Like I
> had indicated earlier we decided to go with
> a simpler approach instead on qualcomm SoCs.
> I am happy to re-spin this patch with your
> comments addressed if we do find other users
> for this feature.

Actually, I think that this approach is necessary.
On many real released product like smartphone
have the dependency between cpufreq and devfreq
for memory bus bandwidth. For example, when playing
the video or get the 60fps without loss.

If possible, this patch should be ongoing on either
now or future. Thanks.

>
> On 2019-04-12 13:09, Chanwoo Choi wrote:
>> Hi,
>>
>> I agree this approach absolutely.
>> Just I add some comments. Please check it.
>>
>> On 19. 3. 29. ìì 12:28, Sibi Sankar wrote:
>>> From: Saravana Kannan <skannan@xxxxxxxxxxxxxx>
>>>
>>> Many CPU architectures have caches that can scale independent of the
>>> CPUs. Frequency scaling of the caches is necessary to make sure the cache
>>> is not a performance bottleneck that leads to poor performance and
>>> power. The same idea applies for RAM/DDR.
>>>
>>> To achieve this, this patch add support for cpu based scaling to the
>>> passive governor. This is accomplished by taking the current frequency
>>> of each CPU frequency domain and then adjusts the frequency of the cache
>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>> to CPU frequency transition notifiers to keep itself up to date on the
>>> current CPU frequency.
>>>
>>> To decide the frequency of the device, the governor does one of the
>>> following:
>>> * Constructs a CPU frequency to device frequency mapping table from
>>> Â required-opps property of the devfreq device's opp_table
>>>
>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>> Â the CPUs are running at their max frequency, the device runs at its
>>> Â max frequency. If the CPUs are running at their min frequency, the
>>> Â device runs at its min frequency. It is interpolated for frequencies
>>> Â in between.
>>>
>>> Signed-off-by: Saravana Kannan <skannan@xxxxxxxxxxxxxx>
>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>> Signed-off-by: Sibi Sankar <sibis@xxxxxxxxxxxxxx>
>>> ---
>>> Âdrivers/devfreq/KconfigÂÂÂÂÂÂÂÂÂÂÂ |ÂÂ 4 +
>>> Âdrivers/devfreq/governor_passive.c | 276 ++++++++++++++++++++++++++++-
>>> Âinclude/linux/devfreq.hÂÂÂÂÂÂÂÂÂÂÂ |Â 43 ++++-
>>> Â3 files changed, 315 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>> index 6a172d338f6d..9a45f464a56b 100644
>>> --- a/drivers/devfreq/Kconfig
>>> +++ b/drivers/devfreq/Kconfig
>>> @@ -72,6 +72,10 @@ config DEVFREQ_GOV_PASSIVE
>>> ÂÂÂÂÂÂ device. This governor does not change the frequency by itself
>>> ÂÂÂÂÂÂ through sysfs entries. The passive governor recommends that
>>> ÂÂÂÂÂÂ devfreq device uses the OPP table to get the frequency/voltage.
>>> +ÂÂÂÂÂ Alternatively the governor can also be chosen to scale based on
>>> +ÂÂÂÂÂ the online CPUs current frequency. A CPU frequency to device
>>> +ÂÂÂÂÂ frequency mapping table(s) is auto-populated by the governor
>>> +ÂÂÂÂÂ for this purpose.
>>>
>>> Âcomment "DEVFREQ Drivers"
>>>
>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>> index 3bc29acbd54e..2506682b233b 100644
>>> --- a/drivers/devfreq/governor_passive.c
>>> +++ b/drivers/devfreq/governor_passive.c
>>> @@ -11,10 +11,63 @@
>>> Â */
>>>
>>> Â#include <linux/module.h>
>>> +#include <linux/cpu.h>
>>> +#include <linux/cpufreq.h>
>>> +#include <linux/cpumask.h>
>>> Â#include <linux/device.h>
>>> Â#include <linux/devfreq.h>
>>> +#include <linux/of.h>
>>> +#include <linux/slab.h>
>>> Â#include "governor.h"
>>>
>>> +static unsigned int xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int cpu)
>>> +{
>>> +ÂÂÂ unsigned int cpu_min, cpu_max;
>>> +ÂÂÂ struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +ÂÂÂ unsigned int dev_min, dev_max, cpu_percent, cpu_freq = 0, freq = 0;
>>> +ÂÂÂ unsigned long *freq_table = devfreq->profile->freq_table;
>>> +ÂÂÂ struct device *dev = devfreq->dev.parent;
>>> +ÂÂÂ struct devfreq_map *map;
>>> +ÂÂÂ int opp_cnt, i;
>>> +
>>> +ÂÂÂ if (!data->state[cpu] || data->state[cpu]->first_cpu != cpu) {
>>> +ÂÂÂÂÂÂÂ freq = 0;
>>> +ÂÂÂÂÂÂÂ goto out;
>>
>> goto out -> return 0;
>>
>>> +ÂÂÂ }
>>> +
>>> +ÂÂÂ /* Use Interpolation if map is not available */
>>> +ÂÂÂ cpu_freq = data->state[cpu]->freq;
>>> +ÂÂÂ if (!data->map) {
>>> +ÂÂÂÂÂÂÂ cpu_min = data->state[cpu]->min_freq;
>>> +ÂÂÂÂÂÂÂ cpu_max = data->state[cpu]->max_freq;
>>> +ÂÂÂÂÂÂÂ if (freq_table) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ dev_min = freq_table[0];
>>> +ÂÂÂÂÂÂÂÂÂÂÂ dev_max = freq_table[devfreq->profile->max_state - 1];
>>
>> Actually, it is not always true. The devfreq recommend the ascending order for
>> 'freq_table'. But, it is not mandatory. Also, some devfreq device uses the
>> decending order for 'freq_table'. So, a patch[1] was considering the order
>> when getting the minimum/maximum frequency from freq_table.
>>
>> If you want to get the minimum/maximum frequency, you have to consider the order
>> of 'freq_table' as the patch[1].
>>
>> [1] df5cf4a36178 ("PM / devfreq: Fix handling of min/max_freq == 0")
>>
>> ÂÂÂÂÂÂÂÂÂÂÂÂ /* Get minimum frequency according to sorting order */
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (freq_table[0] < freq_table[df->profile->max_state - 1])
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ value = freq_table[0];
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ else
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ value = freq_table[df->profile->max_state - 1];
>>
>>
>>> +ÂÂÂÂÂÂÂ } else {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ if (devfreq->max_freq <= devfreq->min_freq)
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ dev_min = devfreq->min_freq;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ dev_max = devfreq->max_freq;
>>> +ÂÂÂÂÂÂÂ }
>>> +
>>> +ÂÂÂÂÂÂÂ cpu_percent = ((cpu_freq - cpu_min) * 100) / cpu_max - cpu_min;
>>> +ÂÂÂÂÂÂÂ freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>> +ÂÂÂÂÂÂÂ goto out;
>>
>> You don't need to jump 'out'. Instead, you better to use the 'else' statement
>> for if data->map is not NULL. I think that almost case when using this patch
>> will be available of data->map. In order to skip the likely 'false' statement,
>> I recommend the following sequence.
>>
>> ÂÂÂÂif (data->map) {
>> ÂÂÂÂÂÂÂ map = data->map[cpu];
>> ÂÂÂÂÂÂÂ ...
>> ÂÂÂÂ} else {
>> ÂÂÂÂÂÂÂ /* Use Interpolation if map is not available */
>> ÂÂÂÂ}
>>
>>
>>> +ÂÂÂ }
>>> +
>>> +ÂÂÂ map = data->map[cpu];
>>> +ÂÂÂ opp_cnt = dev_pm_opp_get_opp_count(dev);
>>> +ÂÂÂ for (i = 0; i < opp_cnt; i++) {
>>> +ÂÂÂÂÂÂÂ freq = max(freq, map[i].dev_hz);
>>> +ÂÂÂÂÂÂÂ if (map[i].cpu_khz >= cpu_freq)
>>> +ÂÂÂÂÂÂÂÂÂÂÂ break;
>>> +ÂÂÂ }
>>> +out:
>>> +ÂÂÂ dev_dbg(dev, "CPU%u: %d -> dev: %u\n", cpu, cpu_freq, freq);
>>
>> IMO, I think it is not necessary. If you want to print log, you better to print
>> it on device driver instead of governor.
>>
>>> +ÂÂÂ return freq;
>>> +}
>>> +
>>> Âstatic int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned long *freq)
>>> Â{
>>> @@ -23,6 +76,7 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>> ÂÂÂÂ struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent;
>>> ÂÂÂÂ unsigned long child_freq = ULONG_MAX;
>>> ÂÂÂÂ struct dev_pm_opp *opp;
>>> +ÂÂÂ unsigned int cpu, tgt_freq = 0;
>>
>> tgt means 'target'? If right, just use target_freq intead of 'tgt_freq'
>> for the readability.
>>
>>> ÂÂÂÂ int i, count, ret = 0;
>>>
>>> ÂÂÂÂ /*
>>> @@ -35,6 +89,14 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>> ÂÂÂÂÂÂÂÂ goto out;
>>> ÂÂÂÂ }
>>>
>>> +ÂÂÂ if (p_data->cpufreq_type) {
>>> +ÂÂÂÂÂÂÂ for_each_possible_cpu(cpu)
>>> +ÂÂÂÂÂÂÂÂÂÂÂ tgt_freq = max(tgt_freq,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ xlate_cpufreq_to_devfreq(p_data, cpu));
>>> +ÂÂÂÂÂÂÂ *freq = tgt_freq;
>>
>> You better to change from 'tgt_freq' to 'target_freq' for the readability.
>>
>>> +ÂÂÂÂÂÂÂ goto out;
>>> +ÂÂÂ }
>>
>> I think that 'goto out' using is not proper for supporting two case.
>> Instead, you better to split out as following according to the type
>> of parent device (devfreq device or cpufreq device).
>>
>> ÂÂÂÂswitch (p_data->parent_type) {
>> ÂÂÂÂcase DEVFREQ_PARENT_DEV:
>> ÂÂÂÂÂÂÂ ret = get_target_freq_with_devfreq()
>> ÂÂÂÂÂÂÂ break;
>> ÂÂÂÂcase CPUFREQ_PARENT_DEV:
>> ÂÂÂÂÂÂÂ ret = get_target_freq_with_cpufreq()
>> ÂÂÂÂÂÂÂ break;
>> ÂÂÂÂdefault:
>> ÂÂÂÂÂÂÂ dev_err(...)
>> ÂÂÂÂÂÂÂ ret = -EINVAL;
>> ÂÂÂÂÂÂÂ goto out;
>> ÂÂÂÂ}
>>
>> ÂÂÂÂif (ret < 0) {
>> ÂÂÂÂÂÂÂ /* exception handling for 'ret' value */
>> ÂÂÂÂ}
>>
>>> +
>>> ÂÂÂÂ /*
>>> ÂÂÂÂÂ * If the parent and passive devfreq device uses the OPP table,
>>> ÂÂÂÂÂ * get the next frequency by using the OPP table.
>>> @@ -149,6 +211,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>> ÂÂÂÂ return NOTIFY_DONE;
>>> Â}
>>>
>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned long event, void *ptr)
>>> +{
>>> +ÂÂÂ struct devfreq_passive_data *data =
>>> +ÂÂÂÂÂÂÂÂÂÂÂ container_of(nb, struct devfreq_passive_data, nb);
>>> +ÂÂÂ struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +ÂÂÂ struct cpufreq_freqs *freq = ptr;
>>> +ÂÂÂ struct devfreq_cpu_state *state;
>>
>> nitpick. how about using 'cpu_state' instead of 'state'?
>> in order to get the meaning from just variable name.
>>
>>> +ÂÂÂ int ret = 0;
>>> +
>>> +ÂÂÂ if (event != CPUFREQ_POSTCHANGE)
>>> +ÂÂÂÂÂÂÂ goto out;
>>
>> just 'return' is simple instead of 'goto out' because this case
>> don't need to treat the any restoring code. And also, you have
>> to check whether freq is NULL or not as following:
>>
>> ÂÂÂÂif (event != CPUFREQ_POSTCHANGE || !freq || data->state[freq->cpu])
>> ÂÂÂÂÂÂÂ return ret;
>> ÂÂÂÂstate = data->state[freq->cpu];
>>
>>> +
>>> +ÂÂÂ state = data->state[freq->cpu];
>>> +ÂÂÂ if (!state)
>>> +ÂÂÂÂÂÂÂ goto out;
>>> +
>>> +ÂÂÂ if (state->freq != freq->new) {
>>> +ÂÂÂÂÂÂÂ state->freq = freq->new;
>>
>> You have to update the frequency after update_devfreq() is completed
>> without error.
>>
>>> +ÂÂÂÂÂÂÂ mutex_lock(&devfreq->lock);
>>> +ÂÂÂÂÂÂÂ ret = update_devfreq(devfreq);
>>> +ÂÂÂÂÂÂÂ mutex_unlock(&devfreq->lock);
>>> +ÂÂÂÂÂÂÂ if (ret)
>>> +ÂÂÂÂÂÂÂÂÂÂÂ dev_err(&devfreq->dev, "Frequency update failed.\n");
>>
>> Almost devfreq error used the following format: "Couldn't ..." .
>> If there is no any specific reason to change the format for error log,
>> ÂÂÂÂ"Couldnt update the frequency.\n"
>>
>>> +ÂÂÂ }> +out:
>>> +ÂÂÂ return ret;
>>
>> Also, we can reduce the unneeded indentation as following:
>>
>> ÂÂÂÂif (state->freq == freq->new)
>> ÂÂÂÂÂÂÂ return ret;
>>
>> ÂÂÂÂmutex_lock(&devfreq->lock);
>> ÂÂÂÂret = update_devfreq(devfreq);
>> ÂÂÂÂmutex_unlock(&devfreq->lock);
>> ÂÂÂÂif (ret) {
>> ÂÂÂÂÂÂÂ dev_err(&devfreq->dev, "Couldnt update the frequency.\n");
>> ÂÂÂÂÂÂÂ return ret;
>> ÂÂÂÂ}
>> ÂÂÂÂstate->freq = freq->new;
>>
>> ÂÂÂÂreturn 0;
>>
>>> +}
>>> +
>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>> +{
>>> +ÂÂÂ unsigned int cpu;
>>> +ÂÂÂ struct devfreq_map **cpu_map;
>>> +ÂÂÂ struct devfreq_map *map, *per_cpu_map;
>>> +ÂÂÂ struct devfreq_passive_data *data = *p_data;
>>> +ÂÂÂ struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +ÂÂÂ int i, count = 0, opp_cnt = 0, ret = 0, iter_val = 0;
>>> +ÂÂÂ struct device_node *np, *opp_table_np, *cpu_np;
>>> +ÂÂÂ struct opp_table *opp_table, *cpu_opp_tbl;
>>> +ÂÂÂ struct device *dev = devfreq->dev.parent;
>>> +ÂÂÂ struct devfreq_cpu_state *state;
>>> +ÂÂÂ struct dev_pm_opp *opp, *cpu_opp;
>>> +ÂÂÂ struct cpufreq_policy *policy;
>>> +ÂÂÂ struct device *cpu_dev;
>>> +ÂÂÂ u64 cpu_khz, dev_hz;
>>> +
>>> +ÂÂÂ get_online_cpus();
>>> +ÂÂÂ data->nb.notifier_call = cpufreq_passive_notifier_call;
>>> +ÂÂÂ ret = cpufreq_register_notifier(&data->nb,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CPUFREQ_TRANSITION_NOTIFIER);
>>> +ÂÂÂ if (ret)
>>> +ÂÂÂÂÂÂÂ return ret;
>>> +
>>> +ÂÂÂ /* Populate devfreq_cpu_state */
>>> +ÂÂÂ for_each_online_cpu(cpu) {
>>> +ÂÂÂÂÂÂÂ if (data->state[cpu])
>>> +ÂÂÂÂÂÂÂÂÂÂÂ continue;
>>> +
>>> +ÂÂÂÂÂÂÂ policy = cpufreq_cpu_get(cpu);
>>> +ÂÂÂÂÂÂÂ if (policy) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ state = kzalloc(sizeof(*state), GFP_KERNEL);
>>> +ÂÂÂÂÂÂÂÂÂÂÂ if (!state)
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return -ENOMEM;
>>> +
>>> +ÂÂÂÂÂÂÂÂÂÂÂ state->first_cpu = cpumask_first(policy->related_cpus);
>>> +ÂÂÂÂÂÂÂÂÂÂÂ state->freq = policy->cur;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ state->min_freq = policy->cpuinfo.min_freq;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ state->max_freq = policy->cpuinfo.max_freq;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ data->state[cpu] = state;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ cpufreq_cpu_put(policy);
>>> +ÂÂÂÂÂÂÂ } else {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ return -EPROBE_DEFER;
>>> +ÂÂÂÂÂÂÂ }
>>> +ÂÂÂ }
>>> +
>>> +ÂÂÂ opp_table_np = dev_pm_opp_of_get_opp_desc_node(dev);
>>> +ÂÂÂ if (!opp_table_np)
>>> +ÂÂÂÂÂÂÂ goto out;
>>> +
>>> +ÂÂÂ opp_cnt = dev_pm_opp_get_opp_count(dev);
>>> +ÂÂÂ if (opp_cnt <= 0)
>>> +ÂÂÂÂÂÂÂ goto put_opp_table;
>>> +
>>> +ÂÂÂ /* Allocate memory for devfreq_map*/
>>> +ÂÂÂ cpu_map = kcalloc(num_possible_cpus(), sizeof(*cpu_map), GFP_KERNEL);
>>> +ÂÂÂ if (!cpu_map)
>>> +ÂÂÂÂÂÂÂ return -ENOMEM;
>>> +
>>> +ÂÂÂ for_each_possible_cpu(cpu) {
>>> +ÂÂÂÂÂÂÂ per_cpu_map = kcalloc(opp_cnt, sizeof(*per_cpu_map),
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ GFP_KERNEL);
>>> +ÂÂÂÂÂÂÂ if (!per_cpu_map)
>>> +ÂÂÂÂÂÂÂÂÂÂÂ return -ENOMEM;
>>> +ÂÂÂÂÂÂÂ cpu_map[cpu] = per_cpu_map;
>>> +ÂÂÂ }
>>> +ÂÂÂ data->map = cpu_map;
>>> +
>>> +ÂÂÂ /* Populate devfreq_map */
>>> +ÂÂÂ opp_table = dev_pm_opp_get_opp_table(dev);
>>> +ÂÂÂ if (!opp_table)
>>> +ÂÂÂÂÂÂÂ return -ENOMEM;
>>> +
>>> +ÂÂÂ for_each_available_child_of_node(opp_table_np, np) {
>>> +ÂÂÂÂÂÂÂ opp = dev_pm_opp_find_opp_of_np(opp_table, np);
>>> +ÂÂÂÂÂÂÂ if (IS_ERR(opp))
>>> +ÂÂÂÂÂÂÂÂÂÂÂ continue;
>>> +
>>> +ÂÂÂÂÂÂÂ dev_hz = dev_pm_opp_get_freq(opp);
>>> +ÂÂÂÂÂÂÂ dev_pm_opp_put(opp);
>>> +
>>> +ÂÂÂÂÂÂÂ count = of_count_phandle_with_args(np, "required-opps", NULL);
>>> +ÂÂÂÂÂÂÂ for (i = 0; i < count; i++) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ for_each_possible_cpu(cpu) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cpu_dev = get_cpu_device(cpu);
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!cpu_dev) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_err(dev, "CPU get device failed.\n");
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
>>> +
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cpu_np = of_parse_required_opp(np, i);
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!cpu_np) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_err(dev, "Parsing required opp failed.\n");
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
>>> +
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Get cpu opp-table */
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cpu_opp_tbl = dev_pm_opp_get_opp_table(cpu_dev);
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!cpu_opp_tbl) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_err(dev, "CPU opp table get failed.\n");
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto put_cpu_node;
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
>>> +
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Match the cpu opp node from required-opp with
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * the cpu-opp table */
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cpu_opp = dev_pm_opp_find_opp_of_np(cpu_opp_tbl,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cpu_np);
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!cpu_opp) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_dbg(dev, "CPU opp get failed.\n");
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto put_cpu_opp_table;
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
>>> +
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cpu_khz = dev_pm_opp_get_freq(cpu_opp);
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (cpu_opp && cpu_khz) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Update freq-map if not already set */
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ map = cpu_map[cpu];
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ map[iter_val].cpu_khz = cpu_khz / 1000;
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ map[iter_val].dev_hz = dev_hz;
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_pm_opp_put(cpu_opp);
>>> +put_cpu_opp_table:
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_pm_opp_put_opp_table(cpu_opp_tbl);
>>> +put_cpu_node:
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ of_node_put(cpu_np);
>>> +ÂÂÂÂÂÂÂÂÂÂÂ }
>>> +ÂÂÂÂÂÂÂ }
>>> +ÂÂÂÂÂÂÂ iter_val++;
>>> +ÂÂÂ }
>>> +ÂÂÂ dev_pm_opp_put_opp_table(opp_table);
>>> +put_opp_table:
>>> +ÂÂÂ of_node_put(opp_table_np);
>>> +out:
>>> +ÂÂÂ put_online_cpus();
>>> +
>>> +ÂÂÂ /* Update devfreq */
>>> +ÂÂÂ mutex_lock(&devfreq->lock);
>>> +ÂÂÂ ret = update_devfreq(devfreq);
>>> +ÂÂÂ mutex_unlock(&devfreq->lock);
>>> +ÂÂÂ if (ret)
>>> +ÂÂÂÂÂÂÂ dev_err(dev, "Frequency update failed.\n");
>>> +
>>> +ÂÂÂ return ret;
>>> +}
>>> +
>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>> +{
>>> +ÂÂÂ int cpu;
>>> +ÂÂÂ struct devfreq_passive_data *data = *p_data;
>>> +
>>> +ÂÂÂ cpufreq_unregister_notifier(&data->nb,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CPUFREQ_TRANSITION_NOTIFIER);
>>> +
>>> +ÂÂÂ for_each_possible_cpu(cpu) {
>>> +ÂÂÂÂÂÂÂ kfree(data->state[cpu]);
>>> +ÂÂÂÂÂÂÂ kfree(data->map[cpu]);
>>> +ÂÂÂÂÂÂÂ data->state[cpu] = NULL;
>>> +ÂÂÂÂÂÂÂ data->map[cpu] = NULL;
>>> +ÂÂÂ }
>>> +
>>> +ÂÂÂ kfree(data->map);
>>> +ÂÂÂ data->map = NULL;
>>> +
>>> +ÂÂÂ return 0;
>>> +}
>>> +
>>> Âstatic int devfreq_passive_event_handler(struct devfreq *devfreq,
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int event, void *data)
>>> Â{
>>> @@ -159,7 +415,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>> ÂÂÂÂ struct notifier_block *nb = &p_data->nb;
>>> ÂÂÂÂ int ret = 0;
>>>
>>> -ÂÂÂ if (!parent)
>>> +ÂÂÂ if (!parent && !p_data->cpufreq_type)
>>> ÂÂÂÂÂÂÂÂ return -EPROBE_DEFER;
>>
>> It makes the fault for the existing devfreq devices with passive governor.
>> Please remove '!p_data->cpufreq_type' condition.
>>
>>>
>>> ÂÂÂÂ switch (event) {
>>> @@ -167,13 +423,21 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>> ÂÂÂÂÂÂÂÂ if (!p_data->this)
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ p_data->this = devfreq;
>>>
>>> -ÂÂÂÂÂÂÂ nb->notifier_call = devfreq_passive_notifier_call;
>>> -ÂÂÂÂÂÂÂ ret = devm_devfreq_register_notifier(dev, parent, nb,
>>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ DEVFREQ_TRANSITION_NOTIFIER);
>>> +ÂÂÂÂÂÂÂ if (p_data->cpufreq_type) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ ret = cpufreq_passive_register(&p_data);
>>> +ÂÂÂÂÂÂÂ } else {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ nb->notifier_call = devfreq_passive_notifier_call;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ ret = devm_devfreq_register_notifier(dev, parent, nb,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ DEVFREQ_TRANSITION_NOTIFIER);
>>> +ÂÂÂÂÂÂÂ }
>>
>> I suggested the my opinion aboyt 'cpufreq_type' variable below.
>> You can separte it for more clready with using parent device type.
>>
>> ÂÂÂÂÂÂÂ if (p_data->parent_type == DEVFREQ_PARENT_DEV)
>> ÂÂÂÂÂÂÂÂÂÂÂ ...
>> ÂÂÂÂÂÂÂ else if (p_data->parent_type == CPUFREQ_PARENT_DEV)
>> ÂÂÂÂÂÂÂÂÂÂÂ ...
>> ÂÂÂÂÂÂÂ else
>> ÂÂÂÂÂÂÂÂÂÂÂ // error handling
>>
>>> ÂÂÂÂÂÂÂÂ break;
>>> ÂÂÂÂ case DEVFREQ_GOV_STOP:
>>> -ÂÂÂÂÂÂÂ devm_devfreq_unregister_notifier(dev, parent, nb,
>>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ DEVFREQ_TRANSITION_NOTIFIER);
>>> +ÂÂÂÂÂÂÂ if (p_data->cpufreq_type) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ cpufreq_passive_unregister(&p_data);
>>> +ÂÂÂÂÂÂÂ } else {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ devm_devfreq_unregister_notifier(dev, parent, nb,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ DEVFREQ_TRANSITION_NOTIFIER);
>>> +ÂÂÂÂÂÂÂ }
>>
>> ditto.
>>
>>> ÂÂÂÂÂÂÂÂ break;
>>> ÂÂÂÂ default:
>>> ÂÂÂÂÂÂÂÂ break;
>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>> index fbffa74bfc1b..e8235fbe49e6 100644
>>> --- a/include/linux/devfreq.h
>>> +++ b/include/linux/devfreq.h
>>> @@ -265,6 +265,38 @@ struct devfreq_simple_ondemand_data {
>>> Â#endif
>>>
>>> Â#if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>> +/**
>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>> + * @freq:ÂÂÂ holds the current frequency of the cpu.
>>> + * @min_freq:ÂÂÂ holds the min frequency of the cpu.
>>> + * @max_freq:ÂÂÂ holds the max frequency of the cpu.
>>> + * @first_cpu:ÂÂÂ holds the cpumask of the first cpu of a policy.
>>> + *
>>> + * This structure stores the required cpu_state of a cpu.
>>> + * This is auto-populated by the governor.
>>> + */
>>> +struct devfreq_cpu_state {
>>> +ÂÂÂ unsigned int freq;
>>> +ÂÂÂ unsigned int min_freq;
>>> +ÂÂÂ unsigned int max_freq;
>>> +ÂÂÂ unsigned int first_cpu;
>>> +};
>>> +
>>> +/**
>>> + * struct devfreq_map - holds mapping from cpu frequency
>>> + * to devfreq frequency
>>> + * @cpu_khz:ÂÂÂ holds the cpu frequency in Khz
>>> + * @dev_hz:ÂÂÂ holds the devfreq device frequency in Hz
>>> + *
>>> + * This structure stores the lookup table between cpu
>>> + * and the devfreq device. This is auto-populated by the
>>> + * governor.
>>> + */
>>> +struct devfreq_map {
>>
>> How about changing the structure name as following or others?
>> - devfreq_freq_map or devfreq_cpufreq_map or others.
>>
>> Because this structure name guessing the meaning of mapping
>> between devfreq frequency and cpufreq frequency.
>>
>>> +ÂÂÂ unsigned int cpu_khz;
>>> +ÂÂÂ unsigned int dev_hz;
>>> +};
>>> +
>>> Â/**
>>> Â * struct devfreq_passive_data - void *data fed to struct devfreq
>>> Â *ÂÂÂ and devfreq_add_device
>>> @@ -278,11 +310,13 @@ struct devfreq_simple_ondemand_data {
>>> Â *ÂÂÂÂÂÂÂÂÂÂÂ the next frequency, should use this callback.
>>> Â * @this:ÂÂÂ the devfreq instance of own device.
>>> Â * @nb:ÂÂÂÂÂÂÂ the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>> + * @state:ÂÂÂ holds the state min/max/current frequency of all online cpu's
>>
>> As I commented above, how about using 'cpu_state' instead of 'state'?
>> in order to get the meaning from just variable name.
>>
>> nitpick. Also, I think that you can skip 'holds' from the description
>> of 'state' variable.
>>
>>
>>> + * @map:ÂÂÂ holds the maps between cpu frequency and device frequency
>>
>> How about using 'cpufreq_map' instead of 'map' for the readability?
>> IMHO, Because this variable is only used when parent device is cpu.
>> I think that if you add to specify the meaningful prefix about cpu to
>> variable name,
>> it is easy to catch the meaning of variable.
>> - map -> cpufreq_map.
>>
>> nitpick. Also, I think that you can skip 'holds' from the description
>> of 'map' variable.
>>
>>> Â *
>>> Â * The devfreq_passive_data have to set the devfreq instance of parent
>>> Â * device with governors except for the passive governor. But, don't need to
>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>> - * them.
>>> + * initialize the 'this', 'nb', 'state' and 'map' field because the devfreq
>>
>> If you agree my opinion above,
>> - state -> cpu_state.
>> - map -> cpufreq_map
>>
>>> + * core will handle them.
>>> Â */
>>> Âstruct devfreq_passive_data {
>>> ÂÂÂÂ /* Should set the devfreq instance of parent device */
>>> @@ -291,9 +325,14 @@ struct devfreq_passive_data {
>>> ÂÂÂÂ /* Optional callback to decide the next frequency of passvice device */
>>> ÂÂÂÂ int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>
>>> +ÂÂÂ /* Should be set if the devfreq device wants to be scaled with cpu*/
>>> +ÂÂÂ u8 cpufreq_type;
>>
>> The devfreq devices with passive governor have always parent
>> either devfreq device or cpufreq device. So, you better to specify
>> the parent type as following: I think that it is more clear to check
>> the type of parent device.
>>
>> ÂÂÂÂenum devfreq_parent_dev_type {
>> ÂÂÂÂÂÂÂ DEVFREQ_PARENT_DEV,
>> ÂÂÂÂÂÂÂ CPUFREQ_PARENT_DEV,
>> ÂÂÂÂ};
>>
>> ÂÂÂÂenum devfreq_parent_dev_type parent_type;
>>
>>> +
>>> ÂÂÂÂ /* For passive governor's internal use. Don't need to set them */
>>> ÂÂÂÂ struct devfreq *this;
>>> ÂÂÂÂ struct notifier_block nb;
>>> +ÂÂÂ struct devfreq_cpu_state *state[NR_CPUS];
>>> +ÂÂÂ struct devfreq_map **map;
>>
>> ditto.
>>
>>> Â};
>>> Â#endif
>>>
>>>
>


--
Best Regards,
Chanwoo Choi
Samsung Electronics