[BUG] "perf top" results in "NOHZ: local_softirq_pending 100"

From: Heiko Carstens
Date: Tue Dec 07 2010 - 07:44:59 EST


While playing around with perf I realized that "perf top" immediatly results
in a "NOHZ: local_softirq_pending 100" message to the console.

0x100 means that a HRTIMER_SOFTIRQ is pending when the cpu tries to disable
the tick.
In perf_event.c we have a call to __hrtimer_start_range_ns() in
perf_swevent_start_hrtimer() where its wakeup parameter is zero.
__hrtimer_start_range_ns() in turn will call hrtimer_enqueue_reprogram()
which will call __raise_softirq_irqoff(HRTIMER_SOFTIRQ) (since wakeup is
zero).
That means that just the HRTIMER_SOFTIRQ bit gets set in the softirq
pending field, but wakeup_softirqd() doesn't get called.

As far as I could see this function gets called from process context with
a spinlock held and hence we don't have any guarantee that this pending
softirq get executed before the idle task gets scheduled and tries to
disable the tick.

The easiest fix would be to set wakeup to one (see patch below), but I guess
there is a reason why its zero. Anybody?

---
kernel/perf_event.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index eac7e33..958b3e0 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4942,7 +4942,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event)
}
__hrtimer_start_range_ns(&hwc->hrtimer,
ns_to_ktime(period), 0,
- HRTIMER_MODE_REL_PINNED, 0);
+ HRTIMER_MODE_REL_PINNED, 1);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/