Re: [BUG] v6.3-rc2 regresses sched_getaffinity() for arm64

From: Linus Torvalds
Date: Tue Mar 14 2023 - 20:52:21 EST


On Tue, Mar 14, 2023 at 3:41 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
>
> Apologies if this is the wrong channel for reporting this - I couldn't find a
> suitable mail on the list for this patch to reply to. Happy to direct it
> somewhere else if appropriate.

No, this is good.

> nproc is calling sched_getaffinity() with a 1024 entry cpu_set mask, then adds
> up all the set bits to find the number of CPUs. I wrote a test program and can
> see that the first 8 bits are always correctly set and most of the other bits
> are always correctly 0. But bits ~64-224 are randomly set/clear from call to call.

Ahh.

Yes, I see what's happening. The code does

unsigned int retlen = min(len, cpumask_size());

and our cpu mask allocation size is set to 4 words - but since your
'nr_cpu_ids' is just 8, only the first word has actually been filled
with valid data.

That "cpumask_size()" thing is meant to be how big the allocation size
is, but clearly there is at least one user that has then taken it to
mean how much data it contains.

Interestingly, that same code already actually checks the length
against the right thing (nr_cpu_ids) elsewhere, but does it kind of
stupidly - first testing that the result fits in the bytes, then
checks that the thing is long-word aligned.

The immediate fix for your issue is likely the attached patch, but I'm
not particularly happy with it. I'd need to at the very least also fix
the same issue in the compat code, but there might be other cases of
this too, where people use the "allocation size" as the "valid bits
size".

Let me think about it some more, but in the meantime you can test if
this patch does indeed fix things for you.

Linus
kernel/sched/core.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index af017e038b48..fdbe7f3b55f0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8413,18 +8413,17 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
return -EINVAL;
if (len & (sizeof(unsigned long)-1))
return -EINVAL;
+ len = BITS_TO_LONGS(nr_cpu_ids) * sizeof(unsigned long);

if (!alloc_cpumask_var(&mask, GFP_KERNEL))
return -ENOMEM;

ret = sched_getaffinity(pid, mask);
if (ret == 0) {
- unsigned int retlen = min(len, cpumask_size());
-
- if (copy_to_user(user_mask_ptr, mask, retlen))
+ if (copy_to_user(user_mask_ptr, mask, len))
ret = -EFAULT;
else
- ret = retlen;
+ ret = len;
}
free_cpumask_var(mask);