Re: [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload

From: Peter Zijlstra
Date: Mon Jan 29 2018 - 12:20:33 EST


On Mon, Jan 29, 2018 at 05:48:33PM +0100, Frederic Weisbecker wrote:
> On Mon, Jan 29, 2018 at 04:38:39PM +0100, Peter Zijlstra wrote:
> > I would very much like a few words on why sched_class::task_tick() is
> > safe to call remote -- from a quick look I think it actually is, but it
> > would be good to have some words here.
>
> Let's rather say I can't prove that it is safe, given the amount of code that
> is behind throughout the various flavour of scheduler features.
>
> But as far as I checked several times, it seems that nothing is accessed locally
> on ::scheduler_tick(). Everything looks fetched from the runqueue struct target
> while it is locked.
>
> If we ever find local references such as "current" or "__this_cpu_*" in the path,
> we'll have to fix them.

Sure, but at least state you audited the code for what issues. That
tells me you know wth you were doing and gives more trust than blindly
changing random code ;-)

> > > +static void sched_tick_start(int cpu)
> > > +{
> > > + struct tick_work *twork;
> > > +
> > > + if (housekeeping_cpu(cpu, HK_FLAG_TICK))
> > > + return;
> >
> > This all looks very static :-(, you can't reconfigure this nohz_full
> > crud after boot?
>
> Unfortunately yes. In fact making the nohz interface dynamically available
> through cpuset is the next big step.

OK, fair enough.

> > > + WARN_ON_ONCE(!tick_work_cpu);
> > > +
> > > + twork = per_cpu_ptr(tick_work_cpu, cpu);
> > > + twork->cpu = cpu;
> > > + INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
> > > + queue_delayed_work(system_unbound_wq, &twork->work, HZ);
> > > +}
> >
> > Similarly, I think we want a few words about how unbound workqueues are
> > expected to behave vs NUMA.
> >
> > AFAICT unbound workqueues by default prefer to run on a cpu in the same
> > node, but if no cpu is available, it doesn't go looking for the nearest
> > node that does have a cpu, it just punts to whatever random cpu.
>
> Yes, and in fact you just made me look into wq_select_unbound_cpu() and
> it looks worse than that. If the current CPU is not in the wq_unbound_cpumask,
> a random one is picked up from that global cpumask without trying a near
> one in the current node.
>
> Looks like room for improvement on the workqueue side. I'll see what I can do.

Great, thanks!