sched_setaffinity usability

From: Ulrich Drepper
Date: Thu Mar 18 2004 - 03:07:21 EST


The sched_setaffinity syscall currently has a usability problem. The
size of cpumask_t is not visible outside the kernel and might change
from kernel to kernel. So, if the user uses a large CPU bitset and
passes it to the kernel it is not known at all whether all the bits
provided in the bitmap are used. The kernel simply copies the first
bytes, enough to fill in the cpumask_t object and ignores the rest.

A simple check for a too large bitset is not good. Programs which are
portable (to different kernels) and future safe should use large bitmap
sizes. Instead the user should only be notified about the size problem
if any nonzero bit is ignored.

Doing this in the kernel isn't good. It would require copying all the
bitmap into the kernel address space. So do it at userlevel.

But how? The userlevel code does not know the size of the type the
kernel used. In the getaffinity call this is handled nicely: the
syscall returns the size of the type.

I think we should do the same for setaffinity. Something like this:

--- kernel/sched.c 2004-03-16 20:57:25.000000000 -0800
+++ kernel/sched.c-new 2004-03-17 23:52:25.000000000 -0800
@@ -2328,6 +2328,8 @@ asmlinkage long sys_sched_setaffinity(pi
goto out_unlock;

retval = set_cpus_allowed(p, new_mask);
+ if (retval == 0)
+ retval = sizeof(new_mask);

out_unlock:
put_task_struct(p);


The userlevel code could then check whether the remaining words in the
bitset contain any set bits.

The interface change is limited to the kernel only. We can arrange for
the sched_setaffinity/pthread_setaffinity calls to still return zero in
case of success (in fact, that's the desirable solution). Additionally,
we could hardcode a size for the case when the syscall returns zero to
handle old kernels.


Is this acceptable?

--
â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/