Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)
From: David Brownell
Date: Sat Apr 19 2008 - 23:51:54 EST
On Saturday 19 April 2008, david@xxxxxxx wrote:
> On Sat, 19 Apr 2008, Thomas Gleixner wrote:
>
> > On Fri, 18 Apr 2008, David Brownell wrote:
> >> On Friday 18 April 2008, Woodruff, Richard wrote:
> >>> When capturing some traces with dynamic tick we were noticing the
> >>> interrupt latency seems to go up a good amount.
> >>
> >>> I was wondering what thoughts of optimizing this might be.
> >>
> >> Cutting down the math implied by jiffies updates might help.
And update_wall_time() costs, too.
> >> The 64 bit math for ktime structs isn't cheap; purely by eyeball,
> >> that was almost 1/3 the cost of that 24 usec (mostly __do_div64).
> >
> > Hmm, I have no real good idea to avoid the div64 in the case of a long
> > idle sleep. Any brilliant patches are welcome :)
That is, in tick_do_update_jiffies64()?
delta = ktime_sub(delta, tick_period);
last_jiffies_update = ktime_add(last_jiffies_update,
tick_period);
/* Slow path for long timeouts */
if (unlikely(delta.tv64 >= tick_period.tv64)) {
s64 incr = ktime_to_ns(tick_period);
ticks = ktime_divns(delta, incr);
last_jiffies_update = ktime_add_ns(last_jiffies_update,
incr * ticks);
}
do_timer(++ticks);
Some math not shown here is converting clocksource values
to ktimes ... cyc2ns() has a comment about needing some
optimization, I wonder if that's an issue here.
Maybe turning tick_period into an *actual* constant (it's
a function of HZ) would help a bit; "incr" too.
Re the "ticks = ktime_divns(...)": since "incr" is constant,
the first thing that comes to mind is a binary search over a
precomputed table.
For HZ=100 (common for ARM) a table of size 128 would exceed
the normal range of NO_HZ tick rates ... down to below 1 HZ.
> how long is 'long idle sleep'? and how common are such sleeps?
The above code says "unlikely()" but that presumes very busy
systems. I would have assumed taking more than one tick was
the most common case, since most systems spend more time idle
than working. I certainly observe it to be the common case,
and it's a power management optimization goal.
> is it
> possibly worth the cost of a test in the hotpath to see if you need to do
> the 64 bit math or can get away with 32 bit math (at least on some
> platforms)
Possibly opening a can of worms, I'll observe that when the
concern is just to update jiffies, converting to ktime values
seems all but needless. Deltas at the level of a clocksource
can be mapped to jiffies as easily as deltas at the nsec level,
saving some work...
Those delta tables could use just 32 bit values in the most
common cases: clocksource ticking at less than 4 GHz, and
the IRQs firing more often than once a second.
- Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/