Re: [PATCH v2] sched: Store restrict_cpus_allowed_ptr() call state

From: Waiman Long
Date: Thu Jan 26 2023 - 15:50:08 EST


On 1/26/23 11:11, Will Deacon wrote:
On Tue, Jan 24, 2023 at 03:24:36PM -0500, Waiman Long wrote:
On 1/24/23 14:48, Will Deacon wrote:
On Fri, Jan 20, 2023 at 09:17:49PM -0500, Waiman Long wrote:
The user_cpus_ptr field was originally added by commit b90ca8badbd1
("sched: Introduce task_struct::user_cpus_ptr to track requested
affinity"). It was used only by arm64 arch due to possible asymmetric
CPU setup.

Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), task_struct::user_cpus_ptr is repurposed to store user
requested cpu affinity specified in the sched_setaffinity().

This results in a performance regression in an arm64 system when booted
with "allow_mismatched_32bit_el0" on the command-line. The arch code will
(amongst other things) calls force_compatible_cpus_allowed_ptr() and
relax_compatible_cpus_allowed_ptr() when exec()'ing a 32-bit or a 64-bit
task respectively. Now a call to relax_compatible_cpus_allowed_ptr()
will always result in a __sched_setaffinity() call whether there is a
previous force_compatible_cpus_allowed_ptr() call or not.
I'd argue it's more than just a performance regression -- the affinity
masks are set incorrectly, which is a user visible thing
(i.e. sched_getaffinity() gives unexpected values).
Can your elaborate a bit more on what you mean by getting unexpected
sched_getaffinity() results? You mean the result is wrong after a
relax_compatible_cpus_allowed_ptr(). Right?
Yes, as in the original report. If, on a 4-CPU system, I do the following
with v6.1 and "allow_mismatched_32bit_el0" on the kernel cmdline:

# for c in `seq 1 3`; do echo 0 > /sys/devices/system/cpu/cpu$c/online; done
# yes > /dev/null &
[1] 334
# taskset -p 334
pid 334's current affinity mask: 1
# for c in `seq 1 3`; do echo 1 > /sys/devices/system/cpu/cpu$c/online; done
# taskset -p 334
pid 334's current affinity mask: f

but with v6.2-rc5 that last taskset invocation gives:

pid 334's current affinity mask: 1

so, yes, the performance definitely regresses, but that's because the
affinity mask is wrong!

I see what you mean now. Hotplug doesn't work quite well now because user_cpus_ptr has been repurposed to store the value set of sched_setaffinity() but not the previous cpus_mask before force_compatible_cpus_allowed_ptr().

One possible solution is to modify the hotplug related code to check for the cpus_allowed_restricted, and if set, check task_cpu_possible_mask() to see if the cpu can be added back to its cpus_mask. I will take a further look at that later.

Cheers,
Longman