Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()

From: Joel Savitz
Date: Fri May 24 2019 - 11:37:14 EST


On Tue, May 21, 2019 at 10:35 AM Michal Koutnà <mkoutny@xxxxxxxx> wrote:
> > $ grep Cpus /proc/$$/status
> > Cpus_allowed: ff
> > Cpus_allowed_list: 0-7
>
> (a)
>
> > $ taskset -p 4 $$
> > pid 19202's current affinity mask: f

> I'm confused where this value comes from, I must be missing something.
>
> Joel, is the task in question put into a cpuset with 0xf CPUs only (at
> point (a))? Or are the CPUs 4-7 offlined as well?

Good point.

It is a bit ambiguous, but I performed no action on the task's cpuset
nor did I offline any cpus at point (a).

After a bit of research, I am fairly certain that the observed
discrepancy is due to differing mechanisms used to acquire the cpuset
mask value.

The first mechanism, via `grep Cpus /proc/$$/status`, has it's value
populated by the expression (task->cpus_allowed) in
fs/proc/array.c:sched_getaffinity(), whereas the taskset utility
(https://github.com/karelzak/util-linux/blob/master/schedutils/taskset.c)
uses sched_getaffinity(2) to determine the "current affinity mask"
value from the expression (task->cpus_allowed & cpu_active_mask) in
kernel/sched/core.c:sched_getaffinty(),

I do not know if there is an explicit reason for this discrepancy or
whether the two mechanisms were simply built independently, perhaps
for different purposes.

I think the /proc/$$/status value is intended to simply reflect the
user-specified policy stating which cpus the task is allowed to run on
without consideration for hardware state, whereas the taskset value is
representative of the cpus that the task can actually be run on given
the restriction policy specified by the user via the cpuset mechanism.

By the way, I posted a v2 of this patch that correctly handles cgroup
v2 behavior.