Re: [patch] timer: Fix timers_update_migration(), and call it in tmigr_init()

From: Mike Galbraith
Date: Sun Apr 30 2017 - 01:08:19 EST


On Sun, 2017-04-30 at 06:20 +0200, Mike Galbraith wrote:
> On Sat, 2017-04-29 at 20:43 -0700, Paul E. McKenney wrote:
> > On Sun, Apr 30, 2017 at 03:21:58AM +0200, Mike Galbraith wrote:
> > > On Sat, 2017-04-29 at 14:45 -0700, Paul E. McKenney wrote:
> > > > On Sat, Apr 29, 2017 at 08:20:33PM +0200, Mike Galbraith wrote:
> > > > > On Sat, 2017-04-29 at 11:06 -0700, Paul E. McKenney wrote:
> > > > >
> > > > > > If someone will either repost a fresh series or point me at
> > > > > > exactly
> > > > > > the set of patches to use, I will run it through rcutorture
> > > > > > again.
> > > > >
> > > > > Patchlet is against x86-tip/master.today.
> > > >
> > > > So today's (as in Saturday April 29) x86-tip/master with the
> > > > following
> > > > patch applied?
> > >
> > > Yeah.
> >
> > OK, will fire it up once the current set of overnight tests
> > complete.
>
> I certainly don't want to discourage you from beating hell outta tip,
> just want to make sure you know that I'm seeing zero RCU woes, only
> late timer expiry (sharpening rocks/sticks to focus trace).

Ah, seems a cpu shutting down the tick can race with add_timer_on(),
leaving the timer stranded until some other event kicks the cpu awake.

<idle>-0 [025] d..4 92.087954: tmigr_group_set_cpu_active: group=ffff88017d03d000 lvl=1 numa=-1 active=1 migrator=25 num_childs=8 parent= (null) nextevt=125916000000 evtcpu=2
<idle>-0 [025] d..4 92.087956: tmigr_group_removeevt: group=ffff88017d03d000 lvl=1 numa=-1 active=1 migrator=25 num_childs=8 parent= (null) nextevt=125916000000 evtcpu=2
<idle>-0 [025] d..3 92.087956: tmigr_group_set_cpu_active: group=ffff880179cd1000 lvl=0 numa=3 active=1 migrator=25 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [025] d..3 92.087957: tmigr_group_removeevt: group=ffff880179cd1000 lvl=0 numa=3 active=1 migrator=25 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [025] d.h3 92.087959: hrtimer_cancel: hrtimer=ffff880277455d20
<idle>-0 [025] d.h2 92.087959: hrtimer_expire_entry: hrtimer=ffff880277455d20 function=tick_sched_timer now=91100168287
<idle>-0 [025] d.h2 92.087964: hrtimer_expire_exit: hrtimer=ffff880277455d20
<idle>-0 [025] d.s3 92.087968: timer_cancel: timer=ffffffff820dd640
<idle>-0 [025] ..s2 92.087969: timer_expire_entry: timer=ffffffff820dd640 function=clocksource_watchdog now=4294915072
<idle>-0 [025] d.s4 92.087971: timer_start: timer=ffffffff820dd640 function=clocksource_watchdog expires=4294915197 [timeout=125] cpu=26 idx=218 flags=
<idle>-0 [025] d.s3 92.087974: wake_up_idle_cpu: NO KICK 26 - !set_nr_and_not_polling(rq->idle)
<idle>-0 [025] ..s2 92.087974: timer_expire_exit: timer=ffffffff820dd640
<idle>-0 [025] .Ns2 92.087982: tmigr_handle_remote: group=ffff880179cd1000 lvl=0 numa=3 active=1 migrator=25 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27 cpu=25
<idle>-0 [025] .Ns2 92.087982: tmigr_handle_remote: group=ffff88017d03d000 lvl=1 numa=-1 active=1 migrator=25 num_childs=8 parent= (null) nextevt=125916000000 evtcpu=2 cpu=25
<idle>-0 [025] dN.3 92.087995: hrtimer_start: hrtimer=ffff880277455d20 function=tick_sched_timer expires=91104000000 softexpires=91104000000
<idle>-0 [026] dN.3 92.088009: tmigr_group_set_cpu_active: group=ffff880179cd1000 lvl=0 numa=3 active=2 migrator=25 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [026] dN.3 92.088011: tmigr_group_removeevt: group=ffff880179cd1000 lvl=0 numa=3 active=2 migrator=25 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [025] d..4 92.088016: tmigr_group_addevt: group=ffff880179cd1000 lvl=0 numa=3 active=1 migrator=25 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [026] dN.3 92.088016: hrtimer_start: hrtimer=ffff880277495d20 function=tick_sched_timer expires=91104000000 softexpires=91104000000
<idle>-0 [025] d..5 92.088017: tmigr_group_set_cpu_inactive: group=ffff88017d03d000 lvl=1 numa=-1 active=1 migrator=26 num_childs=8 parent= (null) nextevt=125916000000 evtcpu=2
<idle>-0 [025] d..4 92.088017: tmigr_group_set_cpu_inactive: group=ffff880179cd1000 lvl=0 numa=3 active=1 migrator=26 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [025] d..2 92.088018: tick_stop: success=1 dependency=NONE
<idle>-0 [025] d..3 92.088018: hrtimer_cancel: hrtimer=ffff880277455d20
<idle>-0 [025] d..3 92.088020: hrtimer_start: hrtimer=ffff880277455d20 function=tick_sched_timer expires=125916000000 softexpires=125916000000
<idle>-0 [026] d..4 92.088022: tmigr_group_addevt: group=ffff880179cd1000 lvl=0 numa=3 active=0 migrator=26 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [026] d..5 92.088023: tmigr_group_addevt: group=ffff88017d03d000 lvl=1 numa=-1 active=0 migrator=26 num_childs=8 parent= (null) nextevt=125916000000 evtcpu=2
<idle>-0 [026] d..5 92.088024: tmigr_group_set_cpu_inactive: group=ffff88017d03d000 lvl=1 numa=-1 active=0 migrator=-1 num_childs=8 parent= (null) nextevt=125916000000 evtcpu=2
<idle>-0 [026] d..4 92.088025: tmigr_group_set_cpu_inactive: group=ffff880179cd1000 lvl=0 numa=3 active=0 migrator=-1 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [026] d..2 92.088025: tick_stop: success=1 dependency=NONE
<idle>-0 [026] d..3 92.088026: hrtimer_cancel: hrtimer=ffff880277495d20
<idle>-0 [026] d..3 92.088027: hrtimer_start: hrtimer=ffff880277495d20 function=tick_sched_timer expires=93148000000 softexpires=93148000000
<idle>-0 [026] d..4 94.135877: tmigr_group_set_cpu_active: group=ffff88017d03d000 lvl=1 numa=-1 active=2 migrator=1 num_childs=8 parent= (null) nextevt=316380000000 evtcpu=56
<idle>-0 [026] d..4 94.135879: tmigr_group_removeevt: group=ffff88017d03d000 lvl=1 numa=-1 active=2 migrator=1 num_childs=8 parent= (null) nextevt=316380000000 evtcpu=56
<idle>-0 [026] d..3 94.135879: tmigr_group_set_cpu_active: group=ffff880179cd1000 lvl=0 numa=3 active=1 migrator=26 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [026] d..3 94.135880: tmigr_group_removeevt: group=ffff880179cd1000 lvl=0 numa=3 active=1 migrator=26 num_childs=8 parent=ffff88017d03d000 nextevt=316380000000 evtcpu=27
<idle>-0 [026] d.h3 94.135881: hrtimer_cancel: hrtimer=ffff880277495d20
<idle>-0 [026] d.h2 94.135882: hrtimer_expire_entry: hrtimer=ffff880277495d20 function=tick_sched_timer now=93148247903
<idle>-0 [026] d.h2 94.136137: hrtimer_expire_exit: hrtimer=ffff880277495d20
<idle>-0 [026] d.s3 94.136141: timer_cancel: timer=ffffffff820dd640
<idle>-0 [026] ..s2 94.136141: timer_expire_entry: timer=ffffffff820dd640 function=clocksource_watchdog now=4294915584
<idle>-0 [026] d.s4 94.136144: timer_start: timer=ffffffff820dd640 function=clocksource_watchdog expires=4294915709 [timeout=125] cpu=27 idx=219 flags=
<idle>-0 [026] d.s3 94.136146: wake_up_idle_cpu: KICK 27
<idle>-0 [026] ..s1 94.136148: clocksource_watchdog: LATE by 387 ticks