Re: [RESEND][PATCH] cpufreq: stats: Add 'load_table' sysfs file toshow accumulated data of CPU

From: Viresh Kumar
Date: Fri Jun 07 2013 - 06:24:00 EST


Hi Chanwoo,

On 5 June 2013 13:41, Chanwoo Choi <cw00.choi@xxxxxxxxxxx> wrote:
> This patch add new sysfs file to show previous accumulated data of CPU load

Please mention we are only accumulating latest 20 values.

> as following path. This sysfs file is used to judge the correct system state
> or determine suitable system resource on user-space.

Please write it as:

load_table will be used to judge how many cpus would be sufficient for
managing current load.

> - /sys/devices/system/cpu/cpu0/cpufreq/stats/load_table
>
> This sysfs file include following data:
> - Measurement point of time
> - CPU frequency
> - Per-CPU load
>
> Signed-off-by: Chanwoo Choi <cw00.choi@xxxxxxxxxxx>
> Signed-off-by: Myungjoo Ham <myungjoo.ham@xxxxxxxxxxx>
> Signed-off-by: Kyungmin Park <kyungmin.park@xxxxxxxxxxx>
> ---
>
> Additionally, I explain an example using 'load_table' sysfs entry.
>
> Exynos4412 series has Quad-core and all cores share the power-line.
> I cann't set diffent voltage/frequency to each CPU. To reduce power-
> consumption, I certainly have to turn on/off CPU online state
> according to CPU load on runtime. As a result, I peridically need to
> monitor current cpu state to determine a proper amount of system
> resource(necessary number of online cpu) and to delete wasted power.
> So, I need 'load_table' sysfs file to monitor current cpu state.
>
> I add a table which show power consumption of target based on
> Exynos4412 SoC. This table indicate the difference power-consumption
> according to a number of online core and with same number of running task.
>
> [Environment of power estimation]
> - cpufreq governor used performance mode to estimate power-consumption on each frequency step.
> - Use infinite-loop test program which execute while statement infinitely.
> - Always measure the power consumption under same temperature during 1 minutes.
> - Unit is mA.
> ------------------------------------------------------------------------------------------------------------------------------------
> A number of Online core | Core 1 | Core 2 | Core 3 | Core 4
> A number of nr_running | 0 1 | 0 1 2 | 0 1 2 3 | 0 1 2 3 4
> ------------------------------------------------------------------------------------------------------------------------------------
> CPU Frequency |
> 800 MHz | 293 330 | 295 338 379 | 300 339 386 433 | 303 341 390 412 482
> 1000 MHz | 312 371 | 316 377 435 | 318 383 454 522 | 322 391 462 526 596
> 1200 MHz | 323 404 | 328 418 504 | 336 423 521 615 | 343 433 499 639 748
> 1600 MHz | 380 525 | 388 556 771 | 399 575 771 1011 | 412 597 822 1172 1684
> ------------------------------------------------------------------------------------------------------------------------------------
>
> For example,
> The case A/B/C have the same condition except for a number of online core.
> - case A: Online core is 2, 1000MHz and nr_running is 1 : 377mA
> - case B: Online core is 3, 1000MHz and nr_running is 1 : 383mA
> - case C: Online core is 4, 1000Mz and nr_running is 1 : 391mA
>
> If system has only one running task, cpu hotplug policy, by monitoring
> cpu state through 'load_table' sysfs file on user-space,
> will determine 'case A' state for reducing power-consumption.
>
> Show the result of reading 'load_table sysfs file as following:
> - cpufreq governor is ondemand_org governor.
>
> $ cat /sys/devices/system/cpu/cpu0/cpufreq/stats/load_table
> Time Frequency cpu0 cpu1 cpu2 cpu3
> 1300500043122 1600000 32 6 0 26
> 1300600079080 800000 63 10 2 45
> 1300700065288 800000 51 9 1 42
> 1300800228747 800000 51 9 1 43
> 1300900182997 800000 78 11 3 47
> 1301000106163 800000 96 26 6 48
> 1301100056247 1600000 45 7 1 27
> 1301200071373 1000000 55 9 1 37
> 1301300096082 800000 54 10 0 45
> 1301400132832 800000 70 11 2 46
> 1301500082290 800000 61 11 1 43
> 1301600236415 800000 61 9 2 43
> 1301700071498 800000 59 10 2 43
> 1301800159290 800000 55 9 1 42
> 1301900076332 800000 66 10 2 43
> 1302000102165 800000 47 9 0 43
> 1302100086623 800000 75 11 2 50
> 1302200101082 800000 66 10 4 46
> 1302300108873 800000 53 10 1 44
> 1302400373373 600000 63 12 1 54

How are you getting loads different for all your cpus? I believe you
are just recording these values for policy->cpu and all cpus share
same policy on your platform.

> Please let me know some opinion about this patch.
>
> Best regards and Thanks,
> Chanwoo Choi
>
> ---
> drivers/cpufreq/cpufreq.c | 4 +++
> drivers/cpufreq/cpufreq_governor.c | 21 ++++++++++--
> drivers/cpufreq/cpufreq_stats.c | 70 ++++++++++++++++++++++++++++++++++++++
> include/linux/cpufreq.h | 6 ++++
> 4 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2d53f47..cbaaff0 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -292,6 +292,10 @@ void __cpufreq_notify_transition(struct cpufreq_policy *policy,
> if (likely(policy) && likely(policy->cpu == freqs->cpu))
> policy->cur = freqs->new;
> break;
> + case CPUFREQ_LOADCHECK:
> + srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
> + CPUFREQ_LOADCHECK, freqs);
> + break;
> }
> }
> /**
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad..bc50624 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -23,12 +23,17 @@
> #include <linux/kernel_stat.h>
> #include <linux/mutex.h>
> #include <linux/slab.h>
> +#include <linux/sched.h>
> #include <linux/tick.h>
> #include <linux/types.h>
> #include <linux/workqueue.h>
>
> #include "cpufreq_governor.h"
>
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + struct cpufreq_freqs freqs;
> +#endif

Why do you need this to be global?

> static struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
> {
> if (have_governor_per_policy())
> @@ -143,11 +148,17 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> idle_time += jiffies_to_usecs(cur_nice_jiffies);
> }
>
> - if (unlikely(!wall_time || wall_time < idle_time))
> + if (unlikely(!wall_time || wall_time < idle_time)) {
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + freqs.load[j] = 0;
> +#endif
> continue;
> + }
>
> load = 100 * (wall_time - idle_time) / wall_time;
> -
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + freqs.load[j] = load;
> +#endif
> if (dbs_data->cdata->governor == GOV_ONDEMAND) {
> int freq_avg = __cpufreq_driver_getavg(policy, j);
> if (freq_avg <= 0)
> @@ -160,6 +171,12 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> max_load = load;
> }
>
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + freqs.time = ktime_to_ns(ktime_get());
> + freqs.old = freqs.new = policy->cur;
> +
> + cpufreq_notify_transition(policy, &freqs, CPUFREQ_LOADCHECK);
> +#endif
> dbs_data->cdata->gov_check_cpu(cpu, max_load);
> }
> EXPORT_SYMBOL_GPL(dbs_check_cpu);
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index fb65dec..2379b1d 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -22,6 +22,8 @@
> #include <linux/notifier.h>
> #include <asm/cputime.h>
>
> +#define CPUFREQ_LOAD_TABLE_MAX 20
> +
> static spinlock_t cpufreq_stats_lock;
>
> struct cpufreq_stats {
> @@ -35,6 +37,10 @@ struct cpufreq_stats {
> unsigned int *freq_table;
> #ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> unsigned int *trans_table;
> +
> + struct cpufreq_freqs *load_table;
> + unsigned int load_last_index;
> + unsigned int load_max_index;
> #endif
> };
>
> @@ -131,6 +137,38 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf)
> return len;
> }
> cpufreq_freq_attr_ro(trans_table);
> +
> +static ssize_t show_load_table(struct cpufreq_policy *policy, char *buf)
> +{
> + struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu);
> + struct cpufreq_freqs *load_table = stat->load_table;
> + ssize_t len = 0;
> + int i;
> + int j;

merge above two lines.

> +
> + len += sprintf(buf + len, "%11s %10s", "Time", "Frequency");
> + for (j = 0; j < NR_CPUS; j++)
> + len += sprintf(buf + len, " %3s%d", "cpu", j);
> + len += sprintf(buf + len, "\n");
> +
> + i = stat->load_last_index;
> + do {
> + len += sprintf(buf + len, "%lld %9d",
> + load_table[i].time,
> + load_table[i].old);
> +
> + for (j = 0; j < NR_CPUS; j++)
> + len += sprintf(buf + len, " %4d",
> + load_table[i].load[j]);
> + len += sprintf(buf + len, "\n");
> +
> + if (++i == stat->load_max_index)
> + i = 0;
> + } while (i != stat->load_last_index);

You want/need some locking to protect addition to this list while
we are reading from it?

> + return len;
> +}
> +cpufreq_freq_attr_ro(load_table);
> #endif
>
> cpufreq_freq_attr_ro(total_trans);
> @@ -141,6 +179,7 @@ static struct attribute *default_attrs[] = {
> &time_in_state.attr,
> #ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> &trans_table.attr,
> + &load_table.attr,
> #endif
> NULL
> };
> @@ -167,6 +206,9 @@ static void cpufreq_stats_free_table(unsigned int cpu)
>
> if (stat) {
> pr_debug("%s: Free stat table\n", __func__);
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + kfree(stat->load_table);
> +#endif
> kfree(stat->time_in_state);
> kfree(stat);
> per_cpu(cpufreq_stats_table, cpu) = NULL;
> @@ -244,6 +286,16 @@ static int cpufreq_stats_create_table(struct cpufreq_policy *policy,
>
> #ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> stat->trans_table = stat->freq_table + count;
> +
> + stat->load_max_index = CPUFREQ_LOAD_TABLE_MAX;
> + stat->load_last_index = 0;
> +
> + alloc_size = sizeof(struct cpufreq_freqs) * stat->load_max_index;

We aren't using this variable multiple times so get rid of it and also you need
to do: sizeof(*stat->load_table).

> + stat->load_table = kzalloc(alloc_size, GFP_KERNEL);
> + if (!stat->load_table) {
> + ret = -ENOMEM;
> + goto error_out;
> + }
> #endif
> j = 0;
> for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
> @@ -312,13 +364,31 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb,
> struct cpufreq_stats *stat;
> int old_index, new_index;
>
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + if (val != CPUFREQ_POSTCHANGE && val != CPUFREQ_LOADCHECK)
> +#else
> if (val != CPUFREQ_POSTCHANGE)
> +#endif
> return 0;
>
> stat = per_cpu(cpufreq_stats_table, freq->cpu);
> if (!stat)
> return 0;
>
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + if (val == CPUFREQ_LOADCHECK) {
> + struct cpufreq_freqs *dest_freq;
> +
> + dest_freq = &stat->load_table[stat->load_last_index];
> + memcpy(dest_freq, freq, sizeof(struct cpufreq_freqs));

again sizeof()...

You don't need to copy full structure probably.

> +
> + if (++stat->load_last_index == stat->load_max_index)
> + stat->load_last_index = 0;
> +
> + return 0;
> + }
> +#endif
> +
> old_index = stat->last_index;
> new_index = freq_table_get_index(stat, freq->new);
>
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 037d36a..f780645 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -140,12 +140,18 @@ static inline bool policy_is_shared(struct cpufreq_policy *policy)
> #define CPUFREQ_POSTCHANGE (1)
> #define CPUFREQ_RESUMECHANGE (8)
> #define CPUFREQ_SUSPENDCHANGE (9)
> +#define CPUFREQ_LOADCHECK (10)
>
> struct cpufreq_freqs {
> unsigned int cpu; /* cpu nr */
> unsigned int old;
> unsigned int new;
> u8 flags; /* flags of cpufreq_driver, see below. */
> +
> +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
> + int64_t time;
> + unsigned int load[NR_CPUS];
> +#endif
> };

Other wise it looks good mostly.

PS: I have cc'd you for a patch of mine which will get rid of most
of the CONFIG_*** you used in your code.. But wait for it to be
applied to change your code accordingly..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/