Re: [PATCH 2/2] PM / devfreq: Add devfreq_transitions debugfs file

From: Chanwoo Choi
Date: Thu Jan 09 2020 - 03:07:12 EST


On 1/8/20 11:10 PM, Dmitry Osipenko wrote:
> 08.01.2020 14:22, Chanwoo Choi ÐÐÑÐÑ:
>> On 1/8/20 6:56 AM, Dmitry Osipenko wrote:
>>> 07.01.2020 12:05, Chanwoo Choi ÐÐÑÐÑ:
>>>> Add new devfreq_transitions debugfs file to track the frequency transitions
>>>> of all devfreq devices for the simple profiling as following:
>>>> - /sys/kernel/debug/devfreq/devfreq_transitions
>>>>
>>>> And the user can decide the storage size (CONFIG_NR_DEVFREQ_TRANSITIONS)
>>>> in Kconfig in order to save the transition history.
>>>>
>>>> [Detailed description of each field of 'devfreq_transitions' debugfs file]
>>>> - time_ms : Change time of frequency transition. (unit: millisecond)
>>>> - dev_name : Device name of h/w.
>>>> - dev : Device name made by devfreq core.
>>>> - parent_dev : If devfreq device uses the passive governor,
>>>> show parent devfreq device name.
>>>> - load_% : If devfreq device uses the simple_ondemand governor,
>>>> load is used by governor whene deciding the new frequency.
>>>> (unit: percentage)
>>>> - old_freq_hz : Frequency before changing. (unit: hz)
>>>> - new_freq_hz : Frequency after changed. (unit: hz)
>>>>
>>>> [For example on Exynos5422-based Odroid-XU3 board]
>>>> $ cat /sys/kernel/debug/devfreq/devfreq_transitions
>>>> time_ms dev_name dev parent_dev load_% old_freq_hz new_freq_hz
>>>> ---------- ------------------------------ ---------- ---------- ---------- ------------ ------------
>>>> 14600 soc:bus_noc devfreq2 devfreq1 0 100000000 67000000
>>>> 14600 soc:bus_fsys_apb devfreq3 devfreq1 0 200000000 100000000
>>>> 14600 soc:bus_fsys devfreq4 devfreq1 0 200000000 100000000
>>>> 14600 soc:bus_fsys2 devfreq5 devfreq1 0 150000000 75000000
>>>> 14602 soc:bus_mfc devfreq6 devfreq1 0 222000000 96000000
>>>> 14602 soc:bus_gen devfreq7 devfreq1 0 267000000 89000000
>>>> 14602 soc:bus_g2d devfreq9 devfreq1 0 300000000 84000000
>>>> 14602 soc:bus_g2d_acp devfreq10 devfreq1 0 267000000 67000000
>>>> 14602 soc:bus_jpeg devfreq11 devfreq1 0 300000000 75000000
>>>> 14602 soc:bus_jpeg_apb devfreq12 devfreq1 0 167000000 84000000
>>>> 14603 soc:bus_disp1_fimd devfreq13 devfreq1 0 200000000 120000000
>>>> 14603 soc:bus_disp1 devfreq14 devfreq1 0 300000000 120000000
>>>> 14606 soc:bus_gscl_scaler devfreq15 devfreq1 0 300000000 150000000
>>>> 14606 soc:bus_mscl devfreq16 devfreq1 0 333000000 84000000
>>>> 14608 soc:bus_wcore devfreq1 9 333000000 84000000
>>>> 14783 10c20000.memory-controller devfreq0 35 825000000 633000000
>>>> 15873 soc:bus_wcore devfreq1 41 84000000 400000000
>>>> 15873 soc:bus_noc devfreq2 devfreq1 0 67000000 100000000
>>>> [snip]
>>>>
>>>> Signed-off-by: Chanwoo Choi <cw00.choi@xxxxxxxxxxx>
>>>> ---
>>>> drivers/devfreq/Kconfig | 13 +++
>>>> drivers/devfreq/devfreq.c | 126 +++++++++++++++++++++++++++++
>>>> drivers/devfreq/governor.h | 3 +
>>>> drivers/devfreq/governor_passive.c | 2 +
>>>> include/linux/devfreq.h | 1 +
>>>> 5 files changed, 145 insertions(+)
>>>>
>>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>>> index 0b1df12e0f21..84936eec0ef9 100644
>>>> --- a/drivers/devfreq/Kconfig
>>>> +++ b/drivers/devfreq/Kconfig
>>>> @@ -74,6 +74,19 @@ config DEVFREQ_GOV_PASSIVE
>>>> through sysfs entries. The passive governor recommends that
>>>> devfreq device uses the OPP table to get the frequency/voltage.
>>>>
>>>> +comment "DEVFREQ Debugging"
>>>> +
>>>> +config NR_DEVFREQ_TRANSITIONS
>>>> + int "Maximum storage size to save DEVFREQ Transitions (10-1000)"
>>>> + depends on DEBUG_FS
>>>> + range 10 1000
>>>> + default "100"
>>>> + help
>>>> + Show the frequency transitions of all devfreq devices via
>>>> + '/sys/kernel/debug/devfreq/devfreq_transitions' for the simple
>>>> + profiling. It needs to decide the storage size to save transition
>>>> + history of all devfreq devices.
>>>> +
>>>> comment "DEVFREQ Drivers"
>>>>
>>>> config ARM_EXYNOS_BUS_DEVFREQ
>>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>>>> index c7f5e4e06420..7abaae06fa65 100644
>>>> --- a/drivers/devfreq/devfreq.c
>>>> +++ b/drivers/devfreq/devfreq.c
>>>> @@ -268,6 +268,57 @@ int devfreq_update_status(struct devfreq *devfreq, unsigned long freq)
>>>> }
>>>> EXPORT_SYMBOL(devfreq_update_status);
>>>>
>>>> +/**
>>>> + * devfreq_update_transitions() - Update frequency transitions for debugfs file
>>>> + * @devfreq: the devfreq instance
>>>> + * @old_freq: the previous frequency before changing the frequency
>>>> + * @new_freq: the new frequency after frequency is changed
>>>> + */
>>>> +struct devfreq_transitions {
>>>> + struct devfreq *devfreq;
>>>> + struct devfreq_freqs freqs;
>>>> + unsigned long load;
>>>> +} debugfs_transitions[CONFIG_NR_DEVFREQ_TRANSITIONS];
>>>> +
>>>> +static spinlock_t devfreq_debugfs_lock;
>>>> +static int debugfs_transitions_index;
>>>> +
>>>> +void devfreq_update_transitions(struct devfreq *devfreq,
>>>> + unsigned long old_freq, unsigned long new_freq,
>>>> + unsigned long busy_time, unsigned long total_time)
>>>> +{
>>>> + unsigned long load;
>>>> + int i;
>>>> +
>>>> + if (!devfreq_debugfs || !devfreq || (old_freq == new_freq))
>>>> + return;
>>>> +
>>>> + spin_lock_nested(&devfreq_debugfs_lock, SINGLE_DEPTH_NESTING);
>>>> +
>>>> + i = debugfs_transitions_index;
>>>> +
>>>> + /*
>>>> + * Calculate the load and if load is larger than 100,
>>>> + * initialize to 100 because the unit of load is percentage.
>>>> + */
>>>> + load = (total_time == 0 ? 0 : (100 * busy_time) / total_time);
>>>> + if (load > 100)
>>>> + load = 100;
>>>> +
>>>> + debugfs_transitions[i].devfreq = devfreq;
>>>> + debugfs_transitions[i].freqs.time = ktime_to_ms(ktime_get());
>>>> + debugfs_transitions[i].freqs.old = old_freq;
>>>> + debugfs_transitions[i].freqs.new = new_freq;
>>>> + debugfs_transitions[i].load = load;
>>>> +
>>>> + if (++i == CONFIG_NR_DEVFREQ_TRANSITIONS)
>>>> + i = 0;
>>>> + debugfs_transitions_index = i;
>>>> +
>>>> + spin_unlock(&devfreq_debugfs_lock);
>>>> +}
>>>> +EXPORT_SYMBOL(devfreq_update_transitions);
>>>> +
>>>> /**
>>>> * find_devfreq_governor() - Find devfreq governor from name
>>>> * @name: name of the governor
>>>> @@ -401,6 +452,10 @@ static int set_target(struct devfreq *devfreq,
>>>> return err;
>>>> }
>>>>
>>>> + devfreq_update_transitions(devfreq, cur_freq, new_freq,
>>>> + devfreq->last_status.busy_time,
>>>> + devfreq->last_status.total_time);
>>>> +
>>>> freqs.new = new_freq;
>>>> notify_transition(devfreq, &freqs, DEVFREQ_POSTCHANGE);
>>>>
>>>> @@ -1787,6 +1842,72 @@ static int devfreq_summary_show(struct seq_file *s, void *data)
>>>> }
>>>> DEFINE_SHOW_ATTRIBUTE(devfreq_summary);
>>>>
>>>> +/**
>>>> + * devfreq_transitions_show() - Show the frequency transitions of the registered
>>>> + * devfreq devices via 'devfreq_transitions' debugfs file.
>>>> + */
>>>> +static int devfreq_transitions_show(struct seq_file *s, void *data)
>>>> +{
>>>> + struct devfreq *devfreq = NULL;
>>>> + struct devfreq *p_devfreq = NULL;
>>>> + struct devfreq_freqs *freqs = NULL;
>>>> + unsigned long load;
>>>> + int i = debugfs_transitions_index;
>>>> + int count;
>>>> +
>>>> + seq_printf(s, "%-10s %-30s %-10s %-10s %-6s %-12s %-12s\n",
>>>> + "time_ms",
>>>> + "dev_name",
>>>> + "dev",
>>>> + "parent_dev",
>>>> + "load_%",
>>>> + "old_freq_hz",
>>>> + "new_freq_hz");
>>>> + seq_printf(s, "%-10s %-30s %-10s %-10s %-6s %-12s %-12s\n",
>>>> + "----------",
>>>> + "------------------------------",
>>>> + "----------",
>>>> + "----------",
>>>> + "----------",
>>>> + "------------",
>>>> + "------------");
>>>
>>> Isn't this needed here?
>>>
>>> mutex_lock(&devfreq_list_lock);
>>
>> It doesn't touch the devfreq instance of devfreq_list.
>> So, it is not necessary locked of devfreq_list_lock.
>
> What stops devfreq device to be removed by another CPU thread while this
> function is in a process of execution?
>
> This condition is unlikely to happen in practice ever, but technically
> it should be possible to happen.
>
>>>> + spin_lock(&devfreq_debugfs_lock);
>>>> + for (count = 0; count < CONFIG_NR_DEVFREQ_TRANSITIONS; count++) {
>>>> + devfreq = debugfs_transitions[i].devfreq;
>>>> + freqs = &debugfs_transitions[i].freqs;
>>>> + load = debugfs_transitions[i].load;
>>>> +
>>>> + i = (CONFIG_NR_DEVFREQ_TRANSITIONS == ++i) ? 0 : i;
>>>> + if (!devfreq)
>>>> + continue;
>>>
>>> I suppose debugfs_transitions[i].devfreq should be set to NULL when
>>> devfreq device is removed, but I don't see it happening anywhere in this
>>> patch.
>>
>> When debugfs_transitions[] array is not fully filled out
>> by devfreq_update_transitions(), debugfs_transitions[i].devfreq is NULL.
>> In this case, if user execute 'cat /sys/kernel/debug/devfreq/devfreq_transitions',
>> devfreq_transitions_show() need to check the debugfs_transitions[i].devfreq
>> is NULL or not.
>>
>> After filled out the debugfs_transitions[] array,
>> actually, 'if(!devfreq)' is not necessary. Maybe, this style is inefficient
>> It need to rework. I'll think again.
>
> Imagine this situation:
>
> 1. there is a devfreq device, let's name it defreq123
>
> 2. the debugfs_transitions array is getting filled and now it has this
> entry:
>
> debugfs_transitions[0].devfreq = defreq123
>
> 3. user removes defreq123 driver module
>
> # rmmod defreq123
>
> 4. the defreq123 is released now
>
> 5. at what memory location debugfs_transitions[0].devfreq is pointing now?
>
>

You're right. It is my missing point.
Instead of storing the devfreq pointer into debugfs_transitions[].devfreq,
It is better to copy the necessary information to debugfs_transitions[]
for preventing the mentioned situation.

--
Best Regards,
Chanwoo Choi
Samsung Electronics