Re: kernel/timer: avoid spurious ksoftirqd wakeups

From: Frederic Weisbecker
Date: Tue Apr 07 2015 - 16:17:36 EST

On Mon, Apr 06, 2015 at 08:51:26PM -0300, Marcelo Tosatti wrote:
> On Tue, Apr 07, 2015 at 01:34:15AM +0200, Frederic Weisbecker wrote:
> > Yeah, it would be nice to make sure that the cause of these softirqs isn't
> > mistakenly ignored.
> > And also I want to be sure we really understand what we
> > are doing, which is not the case right now as we don't know what is causing
> > this expired timer.
> What is the interrupt that is the cause for tick_nohz_stop_sched_tick,
> you mean?
> <...>-45815 [015] d...2.. 25722056692012 (+961446): kvm_exit: reason EXTERNAL_INTERRUPT rip 0x7f5e448479d0 info 0 800000ef
> <...>-45815 [015] d..h1.. 25722056692844 (+832): apic_timer_fn<-__run_hrtimer
> <...>-45815 [015] d...1.. 25722056695442 (+2598): raise_softirq_irqoff <-tick_nohz_stop_sched_tick
> Emulation of guest APIC timer by hrtimer (apic_timer_fn).

Nope, I meant what is the root cause of the softirq.
But lets continue on that below:

> > Sure, but why is it waking up exactly?
> Because there is a bug (the patch is trying to fix the bug by
> raising timer softirq only when timer softirq handler has any
> work to do).
> The only timers pending in the timer list are deferred ones
> from vmstat_update:
> ksoftirqd/15-265 [015] ....111 25722056709372 (+7098): softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/15-265 [015] .....11 25722056709964 (+592): run_timer_softirq <-do_current_softirqs
> ksoftirqd/15-265 [015] ....111 25722056714034 (+4070): timer_expire_entry: timer=ffff88082f6f14a0 function=delayed_work_timer_fn now=4480299175
> ksoftirqd/15-265 [015] ....112 25722056715738 (+1704):
> workqueue_queue_work: work struct=ffff88082f6f1480 function=vmstat_update workqueue=ffff88041f408000 req_cpu=5120 cpu=15
> ksoftirqd/15-265 [015] ....112 25722056716304 (+566): workqueue_activate_work: work struct ffff88082f6f1480
> ksoftirqd/15-265 [015] ....111 25722056719052 (+2748): timer_expire_exit: timer=ffff88082f6f14a0
> ksoftirqd/15-265 [015] ....111 25722056719384 (+332): softirq_exit: vec=1 [action=TIMER]
> Which should only be processed once there are actual add_timer timers
> being fired (there are no such add_timer timers on this CPU).
> Does that make sense?

So the source of these softirqs is those deffered timers? But defferable timers
are only defferable in idle-nohz mode, not full-nohz. They are actually deffered
in practice in full-nohz but it's a bug :o) (which I need to fix).

Still, I don't think this is the source of the softirqs since your patch fixes
the issue of non-timers triggering softirqs.

So here is the issue: something that is not a "struct timer_list" is causing the
expiry time of the next tick to be in the past or now. See tick_nohz_stop_sched_tick(),
the softirq is triggered when delta_jiffies < 1 or when the timer fails to be reprogrammed
because it has already expired.

What can cause this expiry time to be now or in the past? Well for that we need to
check everything that is used to evaluate the next tick:

1) struct timer_list Timers
2) low-res hrtimers
3) scheduler_tick_max_deferment
4) timekeeping_max_deferment
5) (rcu|arch|irq_work)_needs_tick()
6) maybe something else I'm missing

Your patch has reduced the softirq to only be triggered in case 1) and it works
for you. This means the spurious softirqs that you saw were caused by 2,3,4,5 or 6.
I want to know which one and why because I need to understand exactly which event
is going to not trigger a softirq anymore after this patch. We want know that to
ensure there is no side effect after your patch.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at