Re: [PATCH] sched: cpuacct: Track cpuusage for CPU frequencies

From: Mike Chan
Date: Mon Apr 12 2010 - 18:01:18 EST


On Mon, Apr 12, 2010 at 1:03 PM, Thomas Renninger <trenn@xxxxxxx> wrote:
> On Friday 09 April 2010 10:50:33 pm Mike Chan wrote:
>> 2010/4/9 Thomas Renninger <trenn@xxxxxxx>:
>> > On Wednesday 07 April 2010 03:21:59 Mike Chan wrote:
>> >> New file: cpuacct.cpufreq when CONFIG_CPU_FREQ_STATS is enabled.
>> >>
>> >> cpuacct.cpufreq reports the CPU time (nanoseconds) spent at each CPU
>> >> frequency
>> >>
>> >> Maximum number of frequencies supported is 32. As future architectures
>> >> are added that support more than 32 frequency levels, CPUFREQ_TABLE_MAX
>> >> in sched.c needs to be updated.
>> >
>> > Why is accounting of each frequency needed?
>>
>> The intent is to track time spent at each cpu frequency to measure
>> power consumption. Userspace can figure out the mapping between
>> frequency and power consumption. This is also a useful indication of
>> what kind of hw performance userspace apps need (does Chrome really
>> need 1ghz?).
>>
>> Paul Menage had suggested an integral earlier in my [RFC] patch. I
>> wasn't completely against the idea but it had a few shortcomings that
>> I couldn't think of decent solutions for. You would have to either
>> pre-define power consumption for the cpu frequences per-arch or board
>> file. Or have a way to calculate.
> Sounds as if this is for specific CPUs/boards only then.
> X86 boosting and PCC driver are hard, possibly impossible to track (in respect
> to real power consumption).
>

Good point, although the class of CPUs that would benefit would be
ones with fixed frequencies, very common in the ARM world. For X86 and
PPC, if they are not using CONFIG_CPU_FREQ_STAT these statistics will
not be enabled. Perhaps what I should really be checking for is
CONFIG_CPU_FREQ_TABLE.

>> > pcc-cpufreq driver can do every frequency in a range and supports
>> > hundreds of different frequencies, thus it does not depend on
>> > CPU_FREQ_TABLE. Would the average frequency be enough to track/account?
>> Humm, this is a tricky case we haven't yet run into for ARM. Average
>> frequency might not be too useful because power is not linear with
>> speed. We could possibly have buckets for speeds (hi/lo).
> Your whole concept sounds as if it requires limited amount of frequencies.
> Don't mind for the special case I mentioned.
>

True, this was mostly inspired by cpufreq stats/time_in_state file,
which shows how much time the CPU has spent globally at each
frequency.
However applying this to cpu acct groups allows us to track how
userspace applications are consuming processing power.

So here are some thoughts after everyone's feedback:

1) Keep tracking in kernel/sched.c: For CPUs that do not use fixed
frequencies (PPC, x86), this is disabled, as its too difficult (at
least for now) to track. Perhaps later someone with more intelligence
than myself and more knowledge of these architectures can figure out a
way to track.

2) Add cpuacct notifiers: Introduce cpuacct notifiers that board or
mach-cpu files could register. We have the benefit here that at the
arch level we know if / how many fixed frequencies we can run at. This
is probably going to result in a little bit of code duplication across
several architectures, omap, msm and tegra at least for Android.

3) Not useful for upstream. In which case this will go into our
android/common branch.

I'm slightly favoring #2. This also gives us the benefit of
(optionally) exporting some power integral value Paul Menage suggested
earlier, if such power numbers are available from the board-file.

-- Mike

>      Thomas
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/