Re: NUMA API for Linux

From: Andi Kleen
Date: Thu Apr 15 2004 - 05:43:43 EST


On Wed, 14 Apr 2004 17:38:37 -0700
Matthew Dobson <colpatch@xxxxxxxxxx> wrote:



> 1) Redefine the value of some of the MPOL_* flags

I don't want to merge the flags the and the mode argument. It's ugly.

> 2) Rename check_* to mpol_check_*

I really don't understand why you insist on renaming all my functions?
I like the current naming, thank you.

> 3) Remove get_nodes(). This should be done in the same manner as
> sys_sched_setaffinity(). We shouldn't care about unused high bits.

I disagree on that. This would break programs that are first tested
on a small machine and then later run on a big machine (common case)

> 4) Create mpol_check_flags() to, well, check the flags. As the number
> of flags and modes grows, it will be easier to do this check in its own
> function.
> 5) In the syscalls (sys_mbind() & sys_set_mempolicy()), change 'len' to
> a size_t, add __user to the declaration of 'nmask', change 'maxnode' to

unsigned long is the standard for system calls. Check some others.

> 'nmask_len', and condense 'flags' and 'mode' into 'flags'. The
> motivation here is to make this syscall similar to
> sys_sched_setaffinity(). These calls are basically the memory
> equivalent of set/getaffinity, and should look & behave that way. Also,
> dropping an argument leaves an opening for a pid argument, which I
> believe would be good. We should allow processes (with appropriate
> permissions, of course) to mbind other processes.

Messing with other process' VM is a recipe for disaster. There
used to be tons of exploitable races in /proc/pid/mem, I don't want to repeat that.
Adding pid to set_mem_policy would be a bit easier, but it would require
to add a lock to the task struct for this. Currently it is nice and lockless
because it relies on the fact that only the current process can change
its own policy. I prefer to keep it lockless, because that keeps the memory
allocation fast paths faster.

> 6) Change how end is calculated as follows:
> end = PAGE_ALIGN(start+len);
> start &= PAGE_MASK;
> Basically, this allows users to pass in a non-page aligned 'start', and
> makes sure we mbind all pages from the page containing 'start' to the
> page containing 'start'+'len'.

mprotect() does the EINVAL check on unalignment. I think it's better
to follow mprotect here.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/