Re: Bug in hrtimer_get_next_event?

From: Thomas Gleixner
Date: Wed Mar 31 2010 - 06:43:31 EST


Gary,

please configure your mail client to do proper line breaks around 78
chars.

On Tue, 30 Mar 2010, Gary King wrote:

> I am implementing idle state controls (CPU_IDLE) for Tegra SoCs, and
> one of the idle states is not awakened by the hrtimer
> interrupt. There is a system-wide high-resolution timer which can be
> used as a wakeup source, but I need the high-resolution sleep time
> to configure the alarm.
>
> To fix this, I want to use hrtimer_get_next_event; however, the code
> that is in the tree only walks the hrtimer bases when hres mode is
> not active; when hres mode is active, hrtimer_get_next_event always
> returns KTIME_MAX. Is there any reason for the negative comparison,
> or is this a bug?
>
> After changing this locally, I encountered one other problem on
> dynamic-tick systems: get_next_timer_interrupt is called to
> determine whether or not it is safe to enter nohz mode; however,
> hrtimer_get_next_event (which is used by get_next_timer_interrupt)
> will always return <=1 jiffy, since the emulated tick scheduler
> event will be armed when tick_nohz_stop_sched_tick queries the sleep
> time. As a result, tick_nohz_stop_sched_tick will never enter nohz
> mode. I can think of a couple ways to address this (cancel the tick
> timer before querying the event and rearm if necessary from either
> the arch cpu_idle code or nohz_stop_sched_tick; ignore the tick
> timer in hrtimer_get_next_event); does anyone have a recommendation
> for a preferred approach?

get_next_timer_interrupt() and hrtimer_get_next_event() are working
perfectly fine.

In the !HIGHRES case we get the next pending timer from both the timer
wheel and the hrtimer queue. Note that there is no tick timer in the
hrtimer queue, because the tick is generated periodically from the
NOHZ code.

In the HIGHRES case we replace the periodic tick by a hrtimer. We do
return KTIME_MAX in that case because we know from the clock event
when the next hrtimer is due. So the check works that way:

query next timer wheel timer
if that timer is due in the next jiffy, keep going

if not, cancel the tick timer and rearm it to the next timer
wheel interrupt. If there is any hrtimer pending _BEFORE_ the
next timer wheel timer then the clock event is armed to that
event anyway and not overridden by the modified tick timer.

Simply, do not use the hrtimer_get_next_event() and
get_next_timer_interrupt() for your purpose. They work only with the
tick management layer and are not designed for general purpose use.

If you want to know how far away the next timer event is, then use:

tick_nohz_get_sleep_length()

That's going to tell you when the next timer interrupt will happen
when the system is idle. That works for high res off and on case.

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/