Re: [rfc][patch] Memory Binding API v0.3 2.5.41

From: Matthew Dobson (colpatch@us.ibm.com)
Date: Mon Oct 14 2002 - 19:14:42 EST


Eric W. Biederman wrote:
> Matthew Dobson <colpatch@us.ibm.com> writes:
>>Greetings & Salutations,
>> Here's a wonderful patch that I know you're all dying for... Memory
>>Binding! It works just like CPU Affinity (binding) except that it binds a
>>processes memory allocations (just buddy allocator for now) to specific memory
>>blocks.
> Due we want this per numa area or simply per zone? My suspicion is that
> internally at least we want this per zone.
I think that per memory block is better. We already have a method for
allocating from specific zones (GFP_* flags). Also, using per zone
binding would involve setting up some way of enumerating the zones,
which would not be immediately obvious to the users of the API. The
memory block already has a straight-forward definition and an easy way
for users to get the appropriate number for the appropriate block
(in-kernel topology). I'm not fanatically opposed to per zone binding,
though, and if there is a general agreement that it would be better that
way, I don't think it would be unreasonably difficult to change it.

> The API doesn't make much sense at the moment.
Hmm.. That is unfortunate, I'd aimed to make it as simple as possible.

> 1) You are operating on tasks and not mm's, or preferably vmas.
Correct. There are plans (somewhere inside my cranium) to allow binding
at that granularity. For now, per task seemed an appropriate level.

> 2) sys_mem_setbinding does not move the mm to the new binding.
Also correct. A task may wish to allocate several large data structures
from one memory area, rebind, do more allocations, rebind, ad nauseum.
There are plans to have a flag that, if set, would force relocation of
all currently allocated memory.

> 3) You specify a pid and then change current task instead of
> the specified one.
Yep... That was definitely a typo... fixed.

> 4) An ordered zone list is probably the more natural mapping.
See my comments above about per zone/memblk. And you reemphasize my
point, how do we order the zone lists in such a way that a user of the
API can easily know/find out what zone #5 is?

> 5) mprotect is the more natural model rather than set_cpu_affinity.
Well, I think that may be true for the API you are imagining (per zone,
per mm/vma, etc), not the one that I've written.

> 6) The code belongs in mm/* not kernel/*
Possibly... I just stuck it in with the vast majority of other syscalls
in kernel/sys.c. As those changes are just code additions, they can
easily be moved if it is deemed appropriate.

Cheers!

-Matt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:53 EST