Re: [patch] fix hugepage unuseable issu on non-NUMA machine

From: Yinghai Lu
Date: Mon Jun 29 2009 - 13:02:18 EST


alex.shi wrote:
> 73d60b7f747176dbdff826c4127d22e1fd3f9f74 commit introduced a nodes_clear
> function for NUMA machine. But seems the commit omits non-NUMA machine.
> If find_zone_movable_pfns_for_nodes/early_calculate_totalpages has no
> chance to run. nodes_clear will block HUPEPAGE using in my specjbb2005
> testing.
>
>
> So maybe we need to disable nodes_clear sometimes. With the following
> patch. specjbb2005 recovered.

please check if following patch fixed your problem

[PATCH] x86: only clear node_states for 64bit

Nathan reported that
| commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
| Author: Yinghai Lu <yinghai@xxxxxxxxxx>
| Date: Tue Jun 16 15:33:00 2009 -0700
|
| page-allocator: clear N_HIGH_MEMORY map before we set it again
|
| SRAT tables may contains nodes of very small size. The arch code may
| decide to not activate such a node. However, currently the early boot
| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
| active although these nodes have no present pages.
|
| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too

broke the cpuset.mems cgroup attribute on an i386 kvm guest

fix it by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn

Reported-by: Nathan Lynch <ntl@xxxxxxxxx>
Tested-by: Nathan Lynch <ntl@xxxxxxxxx>
Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>

---
arch/x86/mm/init_64.c | 2 ++
mm/page_alloc.c | 13 +++++++------
2 files changed, 9 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -598,6 +598,8 @@ void __init paging_init(void)

sparse_memory_present_with_active_regions(MAX_NUMNODES);
sparse_init();
+ /* clear the default setting with node 0 */
+ nodes_clear(node_states[N_NORMAL_MEMORY]);
free_area_init_nodes(max_zone_pfns);
}

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -4037,6 +4037,8 @@ static void __init find_zone_movable_pfn
int i, nid;
unsigned long usable_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
+ /* save the state before borrow the nodemask */
+ nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
unsigned long totalpages = early_calculate_totalpages();
int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);

@@ -4064,7 +4066,7 @@ static void __init find_zone_movable_pfn

/* If kernelcore was not specified, there is no ZONE_MOVABLE */
if (!required_kernelcore)
- return;
+ goto out;

/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
find_usable_zone_for_movable();
@@ -4163,6 +4165,10 @@ restart:
for (nid = 0; nid < MAX_NUMNODES; nid++)
zone_movable_pfn[nid] =
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
+
+out:
+ /* restore the node_state */
+ node_states[N_HIGH_MEMORY] = saved_node_state;
}

/* Any regular memory on that node ? */
@@ -4247,11 +4253,6 @@ void __init free_area_init_nodes(unsigne
early_node_map[i].start_pfn,
early_node_map[i].end_pfn);

- /*
- * find_zone_movable_pfns_for_nodes/early_calculate_totalpages init
- * that node_mask, clear it at first
- */
- nodes_clear(node_states[N_HIGH_MEMORY]);
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/