Re: [PATCH 1/2] cgroup/cpuset: Keep current cpus list if cpus affinity was explicitly set

From: Valentin Schneider
Date: Fri Jul 29 2022 - 10:15:55 EST


On 28/07/22 11:39, Tejun Heo wrote:
> Hello, Waiman.
>
> On Thu, Jul 28, 2022 at 05:04:19PM -0400, Waiman Long wrote:
>> > So, the patch you proposed is making the code remember one special aspect of
>> > user requested configuration - whether it configured it or not, and trying
>> > to preserve that particular state as cpuset state changes. It addresses the
>> > immediate problem but it is a very partial approach. Let's say a task wanna
>> > be affined to one logical thread of each core and set its mask to 0x5555.
>> > Now, let's say cpuset got enabled and enforced 0xff and affined the task to
>> > 0xff. After a while, the cgroup got more cpus allocated and its cpuset now
>> > has 0xfff. Ideally, what should happen is the task now having the effective
>> > mask of 0x555. In practice, tho, it either would get 0xf55 or 0x55 depending
>> > on which way we decide to misbehave.
>>
>> OK, I see what you want to accomplish. To fully address this issue, we will
>> need to have a new cpumask variable in the the task structure which will be
>> allocated if sched_setaffinity() is ever called. I can rework my patch to
>> use this approach.
>
> Yeah, we'd need to track what user requested separately from the currently
> effective cpumask. Let's make sure that the scheduler folks are on board
> before committing to the idea tho. Peter, Ingo, what do you guys think?
>

FWIW on a runtime overhead side of things I think it'll be OK as that
should be just an extra mask copy in sched_setaffinity() and a subset
check / cpumask_and() in set_cpus_allowed_ptr(). The policy side is a bit
less clear (when, if ever, do we clear the user-defined mask? Will it keep
haunting us even after moving a task to a disjoint cpuset partition?).

There's also if/how that new mask should be exposed, because attaching a
task to a cpuset will now yield a not-necessarily-obvious affinity -
e.g. in the thread affinity example above, if the initial affinity setting
was done ages ago by some system tool, IMO the user needs a way to be able
to expect/understand the result of 0x555 rather than 0xfff.

While I'm saying this, I don't think anything exposes p->user_cpus_ptr, but
then again that one is for "special" hardware...

> Thanks.
>
> --
> tejun