Re: [PATCH] Cpuset: alloc_pages_node overrides cpuset constraints
From: Paul Jackson
Date: Mon Mar 20 2006 - 03:28:29 EST
> You're kidding. This adds more code to every page allocation on every
> machine, cpusets or not.
>
> I stuck this on top:
>
> ...
>
> But it's a bit ugly.
The code path we care about for this __GFP_NOCPUSET flag is for memory
migration. When it says it wants memory allocated on a certain node,
it really wants it there.
While I consider it a bug in the cpuset implementation that any kernel
alloc_pages_node(), kmalloc_node() or vmalloc_node() call has the
requested node ignored if it falls outside the current tasks cpuset,
that's not a priority bug in my view. It's been that way for a year
(since cpusets went in), and no one has noticed.
Christoph - I suspect that the following patch, instead of the one
we sent, would meet with greater approval from Andrew.
The patch below just adds the __GFP_NOCPUSET flag on the memory
migration code path, where we need it. That code path is not
performance critical.
We can deal with the long standing bug in cpusets, where it overrides
all alloc_pages_node() calls, some other day.
What think you of this, Christoph? Should we send it to Andrew?
include/linux/gfp.h | 1 +
kernel/cpuset.c | 2 ++
mm/migrate.c | 5 +++--
3 files changed, 6 insertions(+), 2 deletions(-)
--- 2.6.16-rc6-mm2.orig/include/linux/gfp.h 2006-03-18 13:07:51.000000000 -0800
+++ 2.6.16-rc6-mm2/include/linux/gfp.h 2006-03-19 23:55:19.000000000 -0800
@@ -47,6 +47,7 @@ struct vm_area_struct;
#define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
#define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
+#define __GFP_NOCPUSET ((__force gfp_t)0x40000u)/* Ignore cpuset constraints */
#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
--- 2.6.16-rc6-mm2.orig/kernel/cpuset.c 2006-03-18 13:15:04.000000000 -0800
+++ 2.6.16-rc6-mm2/kernel/cpuset.c 2006-03-19 23:52:42.000000000 -0800
@@ -2209,6 +2209,8 @@ int __cpuset_zone_allowed(struct zone *z
node = z->zone_pgdat->node_id;
if (node_isset(node, current->mems_allowed))
return 1;
+ if (gfp_mask & __GFP_NOCPUSET)
+ return 1;
if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */
return 0;
--- 2.6.16-rc6-mm2.orig/mm/migrate.c 2006-03-18 13:12:53.000000000 -0800
+++ 2.6.16-rc6-mm2/mm/migrate.c 2006-03-20 00:24:36.000000000 -0800
@@ -614,12 +614,13 @@ redo:
* a certain old page is moved to so we cannot
* specify the correct address.
*/
- page = alloc_page_vma(GFP_HIGHUSER, vma,
+ page = alloc_page_vma(GFP_HIGHUSER|__GFP_NOCPUSET, vma,
offset + vma->vm_start);
offset += PAGE_SIZE;
}
else
- page = alloc_pages_node(dest, GFP_HIGHUSER, 0);
+ page = alloc_pages_node(dest,
+ GFP_HIGHUSER|__GFP_NOCPUSET, 0);
if (!page) {
err = -ENOMEM;
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/