Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let thetimekeeper sleep

From: Paul E. McKenney
Date: Wed Dec 18 2013 - 16:57:30 EST


On Wed, Dec 18, 2013 at 01:53:18PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 18, 2013 at 1:49 PM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Wed, Dec 18, 2013 at 01:29:53PM -0800, Andy Lutomirski wrote:
> >> On 12/18/2013 09:43 AM, Frederic Weisbecker wrote:
> >> > On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
> >> >> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
> >> >>> So this is what this series brings, more details following:
> >> >>>
> >> >>> * Some code, naming and whitespace cleanups
> >> >>>
> >> >>> * Allow all CPUs outside the nohz_full range to handle the timekeeping
> >> >>> duty, not just CPU 0. Balancing the timekeeping duty should improve
> >> >>> powersavings.
> >> >>
> >> >> If the system just has one nohz_full cpu running, it will need another
> >> >> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
> >> >> From powersaving POV, that is not good compare to normal nohz idle.
> >> >
> >> > Sure, but everything has a tradeoff :)
> >> >
> >> > We could theoretically run with the timekeeper purely idle if the other
> >> > CPU in full dynticks mode runs in userspace for a long while and seldom
> >> > do syscalls and faults. Timekeeping could be updated on kernel/user
> >> > boundaries in this case without much impact on performances.
> >> >
> >> > But then there is one strict condition for that: it can't read the timeofday
> >> > through the vdso but only through a syscall.
> >>
> >> Where's your ambition? :)
> >>
> >> If the vdso timing functions could see that it's been too long since a
> >> real timekeeping update, they could fall back to a syscall. Otherwise,
> >> they could using rdtsc or whatever is in use.
> >
> > One objection to that approach in the past has been that it injects
> > avoidable latency into the worker CPUs. I suppose that you could argue
> > that the cache misses due to a timekeeping-CPU update are not free, but
> > then again, the syscall is likely to also incur a few cache misses as
> > well.
> >
> > I bet that the timekeeping-CPU approach wins, but it would be cool to
> > see you prove me wrong.
>
> There's already some (very vague) discussion about having a scheduled
> time at which the clock frequency and/or offset will change, and this
> wouldn't be a huge departure from that. The goal there is to avoid
> waiting for timekeeping if vclock_gettime runs concurrently with an
> update, but the same approach could apply here (albeit with one extra
> branch).
>
> Anyway, syscalls aren't *that* expensive.

Like I said, it would be cool to see you prove me wrong, but that will
need to be with patches and performance results rather than rhetoric.

> Alternatively, couldn't workloads like this just turn off NTP?

Some probably could, but others need accurate time.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/