Re: too many timer retries happen when do local timer swtich withbroadcast timer
From: Lorenzo Pieralisi
Date: Fri Feb 22 2013 - 09:48:41 EST
On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote:
> On Fri, 22 Feb 2013, Santosh Shilimkar wrote:
> > On Friday 22 February 2013 04:01 PM, Lorenzo Pieralisi wrote:
> > > On Fri, Feb 22, 2013 at 10:24:00AM +0000, Thomas Gleixner wrote:
> > > > On Fri, 22 Feb 2013, Santosh Shilimkar wrote:
> > > > > BTW, Lorenzo off-list mentioned to me about warning in boot-up
> > > > > which I missed while testing your patch. It will take bit more
> > > > > time for me to look into it and hence thought of reporting it.
> > > > >
> > > > > [ 2.186126] ------------[ cut here ]------------
> > > > > [ 2.190979] WARNING: at kernel/time/tick-broadcast.c:501
> > > > > tick_broadcast_oneshot_control+0x1c0/0x21c()
> > > >
> > > > Which one is that? tick_broadcast_pending or tick_force_broadcast_mask ?
> > >
> > > It is the tick_force_broadcast_mask and I think that's because on all
> > > systems we are testing, the broadcast timer IRQ is a thundering herd,
> > > all CPUs get out of idle at once and try to get out of broadcast mode
> > > at more or less the same time.
> > >
> > So the issue comes ups only when the idle state used where CPU wakeup
> > more or less at same time as Lorenzo mentioned. I have two platforms
> > where I could test the patch and see the issue only with one platform.
> > Yesterday I didn't notice the warning because it wasn't seen on that
> > platform :-) OMAP4 idle entry and exit in deep state is staggered
> > between CPUs and hence the warning isn't seen. On OMAP5 though,
> > there is an additional C-state where idle entry/exit for CPU
> > isn't staggered and I see the issue in that case.
> > Actually the broad-cast code doesn't expect such a behavior
> > from CPUs since only the broad-cast affine CPU should wake
> > up and rest of the CPU should be woken up by the broad-cast
> > IPIs.
> That's what I feared. We might have the same issue on x86, depending
> on the cpu model.
> But thinking more about it. It's actually not a real problem, just
> pointless burning of cpu cycles.
> So on the CPU which gets woken along with the target CPU of the
> broadcast the following happens:
> <-- spurious wakeup
> set forced bit
> enable interrupts
> <-- Nothing happens
> disable interrupts
> <-- Here we observe the forced bit is set
> Now after that the target CPU of the broadcast runs the broadcast
> handler and finds the other CPU in both the broadcast and the forced
> mask, sends the IPI and stuff gets back to normal.
> So it's not actually harmful, just more evidence for the theory, that
> hardware designers have access to very special drug supplies.
> Now we could make use of that and avoid going deep idle just to come
> back right away via the IPI. Unfortunately the notification thingy has
> no return value, but we can fix that.
> To confirm that theory, could you please try the hack below and add
> some instrumentation (trace_printk)?
Applied, and it looks like that's exactly why the warning triggers, at least
on the platform I am testing on which is a dual-cluster ARM testchip.
There is a still time window though where the CPU (the IPI target) can get
back to idle (tick_broadcast_pending still not set) before the CPU target of
the broadcast has a chance to run tick_handle_oneshot_broadcast (and set
tick_broadcast_pending), or am I missing something ?
It is a corner case, granted. Best thing would be to check pending IRQs in the
idle driver back-end (or have always-on local timers :-)).
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/