Re: switching to top frequency too frequent with ondemand governorand no_hz
From: Markus Trippelsdorf
Date: Wed Jun 01 2011 - 14:00:58 EST
On 2011.06.01 at 13:34 -0400, David C Niemi wrote:
> On 06/01/2011 12:08 PM, Markus Trippelsdorf wrote:
> > There seems to be a major difference in the behavior of the ondemand
> > governor depending on whether CONFIG_NO_HZ is set or not in the kernel
> > .config.
> >
> > In the NO_HZ case the ondemand governor spends too much time at the
> > highest frequency and is also very trigger happy.
> >
> > I have compared the two cases on my system:
> > powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
> > powernow-k8: 0 : pstate 0 (3200 MHz)
> > powernow-k8: 1 : pstate 1 (2500 MHz)
> > powernow-k8: 2 : pstate 2 (2100 MHz)
> > powernow-k8: 3 : pstate 3 (800 MHz)
> >
> > When I run:
> > watch -n.1 'cat /proc/cpuinfo|grep MHz'
> > on an otherwise idle system, I can see that the frequency always stays
> > at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
> > frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
> > conditions.
> >
> > This also manifests itself in the cpufreq/stats/time_in_state
> > statistics (again on a mostly idle system):
> >
> > First taken with:
> > echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
> > (BTW wouldn't it make sense to use something like this as the default
> > value?)
> >
> > cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
> >
> > CONFIG_NO_HZ not set:
> > 3200000 5845
> > 2500000 0
> > 2100000 5
> > 800000 31552
> >
> > CONFIG_NO_HZ=y:
> > 3200000 17650
> > 2500000 0
> > 2100000 0
> > 800000 31129
> >
> >
> > And with the default sampling_down_factor=1
> >
> > CONFIG_NO_HZ not set:
> > 3200000 140
> > 2500000 2
> > 2100000 29
> > 800000 16614
> >
> > CONFIG_NO_HZ=y:
> > 3200000 538
> > 2500000 9
> > 2100000 77
> > 800000 16287
> >
> > Now my question is, is this expected? And what could be done to make the
> > NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.
>
> A very interesting bit of information. What do you have set for
> up_threshold? You may have to set it higher for CONFIG_NO_HZ than
> without, based on your symptoms. Another thing to look at is your
> sampling_rate. I'm guessing it differs between CONFIG_NO_HZ being set
> or not.
I've played with all those parameters, but unfortunately it didn't make
any difference.
> And perhaps you need to set sampling_down_factor a bit lower. I
> consider 100 a reasonable default, but a default of "1" was put in
> initially to make the behavior of the patch that enabled the factor
> identical with not having the patch. If you are more concerned with
> saving power than maximizing throughput, you might consider a much
> lower value like 5 or 10.
Yes, I've tried different values and 200 turned out to be the best based
on my preferences (throughput over power saving). It makes a big
difference in the compile time of bigger projects, especially during the
configuration phase.
But I have found the root cause of symptoms described above by
bisection. It turned out that 2.6.39 is also affected, so I've bisected
down to 2.6.38.
This is the result:
5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
Author: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Date: Mon Feb 7 17:14:25 2011 +0100
[CPUFREQ] calculate delay after dbs_check_cpu
When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
--
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/