Re: [PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs

From: Srikar Dronamraju
Date: Wed Oct 24 2018 - 05:47:01 EST

* Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> [2018-10-24 09:56:36]:

> On Wed, Oct 24, 2018 at 08:32:49AM +0530, Srikar Dronamraju wrote:
> It would certainly be a bit odd because the
> application is asking for some protection but no guarantees are given
> and the application is not made aware via an error code that there is a
> problem. Asking the application to parse dmesg hoping to find the right
> error message is going to be fragile.

Its a actually a good question.
What should we be doing if a mix of isolcpus and housekeeping (aka
non-isolcpus) is given in the mask.

Right now as you pointed, there is no easy way for the application to know
which are the non-isolcpus to set its affinity. cpusets effective_cpus and
cpus_allowed both will contain isolcpus too.

> Would it be more appropriate to fail sched_setaffinity when there is a
> mix of isolated and housekeeping CPUs? In that case, an info message in
> dmesg may be appropriate as it'll likely be a once-off configuration
> error that's obvious due to an application failure.

I was looking at option of returning an error in that case. However our
own perf bench numa cant handle it. We can be sure that there will other
tools who could have coded that way and may start failing. So I am not sure
if thats a good option. Previously they would *seem* to be working.

Thats why I have taken an option of correcting within the kernel.

> Alternatively,
> should NUMA balancing ignore isolated CPUs? The latter seems unusual as
> the application has specified a mask that allows those CPUs and it's not
> clear why NUMA balancing should ignore them. If anything, an application
> that wants to avoid all interference should also be using memory policies
> to bind to nodes so it behaves predictably with respect to access latencies
> (presumably if an application cannot tolerate kernel threads interfering
> then it also cannot tolerate remote access latencies) or disabling NUMA
> balancing entirely to avoid incurring minor faults.

The numa balancing is doing the right thing because if the application has
the mask specified, then we should allow the application to run.

The problem happens when the unbound application starts to run on the isolcpu,
regular load balancing will no more work. If another task is bound on the
isolcpu, and other cpus in the node are free, this task will still contend
with task bound to the isolcpus. (Unless numa affinity of the unbound thread

Also isocpus are suppose to run only those tasks that are bound to it.
So we are kind of breaking that rule.

Thanks and Regards
Srikar Dronamraju