Re: too many timer retries happen when do local timer swtich withbroadcast timer

From: Thomas Gleixner
Date: Wed Feb 20 2013 - 08:33:13 EST


On Wed, 20 Feb 2013, Jason Liu wrote:
> void arch_idle(void)
> {
> ....
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
>
> enter_the_wait_mode();
>
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> }
>
> when the broadcast timer interrupt arrives(this interrupt just wakeup
> the ARM, and ARM has no chance
> to handle it since local irq is disabled. In fact it's disabled in
> cpu_idle() of arch/arm/kernel/process.c)
>
> the broadcast timer interrupt will wake up the CPU and run:
>
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); ->
> tick_broadcast_oneshot_control(...);
> ->
> tick_program_event(dev->next_event, 1);
> ->
> tick_dev_program_event(dev, expires, force);
> ->
> for (i = 0;;) {
> int ret = clockevents_program_event(dev, expires, now);
> if (!ret || !force)
> return ret;
>
> dev->retries++;
> ....
> now = ktime_get();
> expires = ktime_add_ns(now, dev->min_delta_ns);
> }
> clockevents_program_event(dev, expires, now);
>
> delta = ktime_to_ns(ktime_sub(expires, now));
>
> if (delta <= 0)
> return -ETIME;
>
> when the bc timer interrupt arrives, which means the last local timer
> expires too. so,
> clockevents_program_event will return -ETIME, which will cause the
> dev->retries++
> when retry to program the expired timer.
>
> Even under the worst case, after the re-program the expired timer,
> then CPU enter idle
> quickly before the re-progam timer expired, it will make system
> ping-pang forever,

That's nonsense.

The timer IPI brings the core out of the deep idle state.

So after returning from enter_wait_mode() and after calling
clockevents_notify() it returns from arch_idle() to cpu_idle().

In cpu_idle() interrupts are reenabled, so the timer IPI handler is
invoked. That calls the event_handler of the per cpu local clockevent
device (the one which stops in C3). That ends up in the generic timer
code which expires timers and reprograms the local clock event device
with the next pending timer.

So you cannot go idle again, before the expired timers of this event
are handled and their callbacks invoked.


Now the reprogramming itself is a different issue.

It's necessary as we have no information whether the wakeup was caused
by the timer IPI or by something else.

We might try to be clever and store the pending IPI in
tick_handle_oneshot_broadcast() and avoid the reprogramming in that
case. Completely untested patch below.

Thanks,

tglx

Index: linux-2.6/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-broadcast.c
+++ linux-2.6/kernel/time/tick-broadcast.c
@@ -29,6 +29,7 @@
static struct tick_device tick_broadcast_device;
/* FIXME: Use cpumask_var_t. */
static DECLARE_BITMAP(tick_broadcast_mask, NR_CPUS);
+static DECLARE_BITMAP(tick_broadcast_pending, NR_CPUS);
static DECLARE_BITMAP(tmpmask, NR_CPUS);
static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
static int tick_broadcast_force;
@@ -417,9 +418,10 @@ again:
/* Find all expired events */
for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
td = &per_cpu(tick_cpu_device, cpu);
- if (td->evtdev->next_event.tv64 <= now.tv64)
+ if (td->evtdev->next_event.tv64 <= now.tv64) {
cpumask_set_cpu(cpu, to_cpumask(tmpmask));
- else if (td->evtdev->next_event.tv64 < next_event.tv64)
+ set_bit(cpu, tick_broadcast_pending);
+ } else if (td->evtdev->next_event.tv64 < next_event.tv64)
next_event.tv64 = td->evtdev->next_event.tv64;
}

@@ -493,7 +495,8 @@ void tick_broadcast_oneshot_control(unsi
cpumask_clear_cpu(cpu,
tick_get_broadcast_oneshot_mask());
clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
- if (dev->next_event.tv64 != KTIME_MAX)
+ if (dev->next_event.tv64 != KTIME_MAX &&
+ !test_and_clear_bit(cpu, tick_broadcast_pending))
tick_program_event(dev->next_event, 1);
}
}






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/