Re: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

From: Frederic Weisbecker
Date: Mon Jul 06 2015 - 12:14:24 EST


On Fri, Jun 12, 2015 at 09:36:50AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 11, 2015 at 07:36:07PM +0200, Frederic Weisbecker wrote:
> > +static void tick_nohz_full_update_dependencies(void)
> > +{
> > + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> > +
> > + if (!posix_cpu_timers_can_stop_tick(current))
> > + ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
> > +
> > + if (!perf_event_can_stop_tick())
> > + ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
> > +
> > + if (!sched_can_stop_tick())
> > + ts->tick_needed |= TICK_NEEDED_SCHED;
> >
> > #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> > /*
> > + * sched_clock_tick() needs us?
> > + *
> > * TODO: kick full dynticks CPUs when
> > * sched_clock_stable is set.
> > */
> > if (!sched_clock_stable()) {
> > + ts->tick_needed |= TICK_NEEDED_CLOCK_UNSTABLE;
> > /*
> > * Don't allow the user to think they can get
> > * full NO_HZ with this machine.
> > */
> > WARN_ONCE(tick_nohz_full_running,
> > "NO_HZ FULL will not work with unstable sched clock");
> > }
> > #endif
> > }
>
> Colour me confused; why does this function exist at all? Should not
> these bits be managed by those respective subsystems?

So we have two choices here:

1) Something changes in a subsystem which needs the tick and that subsystem
sends an IPI to the CPU that is concerned such that it changes the tick
dependency state.

pros: The dependency bits are always modified and read locally
cons: We need to also check the subsystems from task switch because the next
task may have different dependencies than prev. So that's context switch
overhead

2) Whenever a subsystem changes its dependency to the tick (needs or doesn't need
anymore), that subsystem remotely changes the dependency bits then sends an IPI
in case we switched from "tick needed" to "tick not needed".

pros: Less context switch overhead
cons: Works for some subsystems for which dependency is per CPU: (scheduler)
Others for which dependency is per task exclusively or system wide need
more complicated treatment: posix cpu timers would then need to switch to
a seperate global flag.
perf depends on both a global state and a per cpu state.
The flags are read remotely. This involve some ordering but no full barrier
since we have the IPI.

This patchset takes the simple 1) way which definetly can be improved.

Perhaps we should do 2) with one global mask and one per cpu mask and all flags
atomically and remotely set and clear by the relevant subsystems.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/