Re: [PATCH 2/2] x86,mm,64bit: Round up memory boundary forinit_memory_mapping_high()

From: Tejun Heo
Date: Fri Feb 25 2011 - 06:16:17 EST


On Thu, Feb 24, 2011 at 10:20:35PM -0800, Yinghai Lu wrote:
> tj pointed out:
> when node does not have 1G aligned boundary, like 128M.
> init_memory_mapping_high() could render smaller mapping by 128M on one node,
> and 896M on next node with 2M pages instead of 1g page. that could increase
> TLB presure.
>
> So if gb page is used, try to align the boundary to 1G before calling
> init_memory_mapping_ext(), to make sure only use one 1g entry for that cross
> node 1G.
> Need to init_meory_mapping_ext() to table tbl_end, to make sure pgtable is on
> previous node instead of next node.

I don't know, Yinghai. The whole code seems overly complicated to me.
Just ignore e820 map when building linear mapping. It doesn't matter.
Why not just do something like the following? Also, can you please
add some comments explaining how the NUMA affine allocation actually
works for page tables? Or better, can you please make that explicit?
It currently depends on memories being registered in ascending address
order, right? The memblock code already is NUMA aware, I think it
would be far better to make the node affine part explicit.

Thanks.

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46e684f..4fd0b59 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -966,6 +966,11 @@ void __init setup_arch(char **cmdline_p)
memblock.current_limit = get_max_mapped();

/*
+ * Add whole lot of comment explaining what's going on and WHY
+ * because as it currently stands, it's frigging cryptic.
+ */
+
+ /*
* NOTE: On x86-32, only from this point on, fixmaps are ready for use.
*/

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 7757d22..50ec03c 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -536,8 +536,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
if (!numa_meminfo_cover_memory(mi))
return -EINVAL;

- init_memory_mapping_high();
-
/* Finally register nodes. */
for_each_node_mask(nid, node_possible_map) {
u64 start = (u64)max_pfn << PAGE_SHIFT;
@@ -550,8 +548,12 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
end = max(mi->blk[i].end, end);
}

- if (start < end)
+ if (start < end) {
+ init_memory_mapping(
+ ALIGN_DOWN_TO_MAX_MAP_SIZE_AND_CONVERT_TO_PFN(start),
+ ALIGN_UP_SIMILARY_BUT_DONT_GO_OVER_MAX_PFN(end));
setup_node_bootmem(nid, start, end);
+ }
}

return 0;


--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/