Re: [RFC PATCH 12/12] housekeeping: Reimplement isolcpus on housekeeping

From: Frederic Weisbecker
Date: Mon Aug 28 2017 - 13:33:30 EST


On Mon, Aug 28, 2017 at 06:24:16PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 28, 2017 at 05:27:15PM +0200, Frederic Weisbecker wrote:
> > On Mon, Aug 28, 2017 at 03:31:16PM +0200, Peter Zijlstra wrote:
>
> > > I'm fairly sure that was very intentional. If you want to isolate stuff
> > > you don't want load-balancing.
> >
> > Yes I guess that was intentional. In fact having NULL domains is convenient
> > as it also isolates from many things: tasks, workqueues, timers.
>
> Huh, what? That's entirely unrelated to the NULL domain.
>
> The reason people like isolcpus= is that is ensures _nothing_ runs on
> those CPUs before you explicitly place something there.
>
> _That_ is what ensures there are no timers etc.. placed on those CPUs.

Sure that's what I meant.

>
> Once you run something on that CPU, it stays there.
>
> It is also what I dislike about isolcpus, its a boot time feature, if
> you want to reconfigure your system you need a reboot.

Indeed.

>
> > Although for example I guess (IIUC) that if you create an unbound
> > timer on a NULL domain, it will be stuck on it for ever as we can't
> > walk any hierarchy from the current CPU domain.
>
> Not sure what you're on about. Timers have their own hierarchy.

Check out get_nohz_timer_target() which relies on scheduler hierarchies to
look up a CPU to enqueue an unpinned timer on.

>
> > I'm not sure how much that can apply to unbound workqueues
> > as well.
>
> Well, unbound workqueued will not immediately end up on those CPUs,
> since they'll have an affinity exlusive of those CPUs per construction.

Ah that's right.

> But IIRC there's an affinity setting for workqueues where you could
> force it on if you wanted to.

Yep: /sys/devices/virtual/workqueue/cpumask

>
> > But the thing is with NULL domains: things can not migrate in and neither
> > can them migrate out, which is not exactly what CPU isolation wants.
>
> No, its exactly what they want. You get what you put in and nothing
> more. If you want something else, use cpusets.

That's still a subtle behaviour that involves knowledge of some scheduler
core details. I wish we hadn't exposed such a low level scheduler control
as a general purpose kernel parameter.

Anyway at least that confirms one worry we had: kernel parameters are kernel
ABI that we can't break.

>
> > > Now, I completely hate the isolcpus feature and wish is a speedy death,
> > > but replacing it with something sensible is difficult because cgroups
> > > :-(
> >
> > Ah, that would break cgroup somehow?
>
> Well, ideally something like this would start the system with all the
> 'crap' threads in !root cgroup. But that means cgroupfs needs to be
> populated with at least two directories on boot. And current cgroup
> cruft doesn't expect that.

Ah I see.

Thanks!