Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3

From: Ingo Molnar
Date: Wed May 24 2017 - 03:16:37 EST



* Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:

> On Tue, May 23, 2017 at 09:25:08AM +0200, Ingo Molnar wrote:
> >
> > * Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> >
> > > v2 had issues on -tip tree and triggered a warning. It seems to have
> > > disappeared. Perhaps it was due to another timer issue. Anyway this
> > > version brings more debugging informations, with a layout that is more
> > > bisection-friendly and it also handles ticks that fire outside IRQ
> > > context and thus carry NULL irq regs. This happen when
> > > hrtimer_interrupt() is called on hotplug cpu down for example.
> > >
> > > We'll see if the issue arises again.
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > nohz/fixes
> > >
> > > HEAD: cd15f46b284f04dbedd065a9d99a4e0badae379a
> > >
> > > Thanks,
> > > Frederic
> > > ---
> > >
> > > Frederic Weisbecker (2):
> > > nohz: Add hrtimer sanity check
> > > nohz: Fix collision between tick and other hrtimers, again
> > >
> > >
> > > kernel/time/tick-sched.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> > > kernel/time/tick-sched.h | 2 ++
> > > 2 files changed, 45 insertions(+), 5 deletions(-)
> >
> > So I think the 3 commits queued up right now:
> >
> > 99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
> > 411fe24e6b7c: nohz: Fix collision between tick and other hrtimers, again
> > ce6cf9a15d62: nohz: Add hrtimer sanity check
> >
> > are OK and I'd not rebase them unless there's some breakage.
> >
> > One thing I noticed: your second series does appear to have:
> >
> > 99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
> >
> > is that intentional? That is pretty much the only commit I'd love to rebase with a
> > proper description added.
>
> Yes in my latest series I melted "nohz: Reset next_tick cache even when the timer has no regs"
> into "nohz: Fix collision between tick and other hrtimers, again" because it's a fixup and
> keeping that patch separate may break bisection.
>
> So ideally, it would be nice if you could fixup 411fe24e6b7c with 99fa871820cf. That's roughly
> all I did in my latest series.

So the interdiff between your two patches and the 3 commits already queued up is:

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e3043873fcdc..30253ed0380b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -150,12 +150,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
touch_softlockup_watchdog_sched();
if (is_idle_task(current))
ts->idle_jiffies++;
- /*
- * In case the current tick fired too early past its expected
- * expiration, make sure we don't bypass the next clock reprogramming
- * to the same deadline.
- */
- ts->next_tick = 0;
}
#endif
update_process_times(user_mode(regs));
@@ -1103,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
tick_sched_handle(ts, regs);

/* No need to reprogram if we are running tickless */
- if (unlikely(ts->tick_stopped))
+ if (unlikely(ts->tick_stopped)) {
+ /*
+ * In case the current tick fired too early past its expected
+ * expiration, make sure we don't bypass the next clock reprogramming
+ * to the same deadline.
+ */
+ ts->next_tick = 0;
return;
+ }

hrtimer_forward(&ts->sched_timer, now, tick_period);
tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
@@ -1202,12 +1203,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
*/
if (regs)
tick_sched_handle(ts, regs);
- else
- ts->next_tick = 0;

/* No need to reprogram if we are in idle or full dynticks mode */
- if (unlikely(ts->tick_stopped))
+ if (unlikely(ts->tick_stopped)) {
+ /*
+ * In case the current tick fired too early past its expected
+ * expiration, make sure we don't bypass the next clock reprogramming
+ * to the same deadline.
+ */
+ ts->next_tick = 0;
return HRTIMER_NORESTART;
+ }

hrtimer_forward(timer, now, tick_period);


... so the two are not the same - I'd rather not rebase it, I'd like to keep what
is working, we had problems with these changes before ...

If you'd like the changes in this interdiff to be applied as well, please add a
changelog to it and post it as a fourth patch.

Thanks,

Ingo