Re: [PATCH 4/6] Have x86_64 use add_active_range() andfree_area_init_nodes

From: Andrew Morton
Date: Sun May 21 2006 - 15:08:27 EST


Mel Gorman <mel@xxxxxxxxx> wrote:
>

> > Anyway, I just don't get how this code can work. We have an e820 map with
> > up to 128 entries (this machine has ten) and we're trying to scrunch that
> > all into the four-entry early_node_map[].
> >
>
> Missing E820MAX was a mistake. On x86_64, CONFIG_MAX_ACTIVE_REGIONS should
> have been used. I didn't expect x86_64 to have so many memory holes.

x86 uses 128 e820 slots too.

>
> > On my little x86 PC:
> >
> > BIOS-provided physical RAM map:
> > BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
> > BIOS-e820: 000000000009bc00 - 000000000009c000 (reserved)
> > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> > BIOS-e820: 0000000000100000 - 000000000ffc0000 (usable)
> > BIOS-e820: 000000000ffc0000 - 000000000fff8000 (ACPI data)
> > BIOS-e820: 000000000fff8000 - 0000000010000000 (ACPI NVS)
> > BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
> > BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> > BIOS-e820: 00000000ffb80000 - 00000000ffc00000 (reserved)
> > BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
> > 0MB HIGHMEM available.
> > 255MB LOWMEM available.
> > found SMP MP-table at 000ff780
> > Range (nid 0) 0 -> 65472, max 4
> > On node 0 totalpages: 65472
> > DMA zone: 4096 pages, LIFO batch:0
> > Normal zone: 61376 pages, LIFO batch:15
> >
> > So here, the architecture code only called add_active_range() the once, for
> > the entire memory map.
>
> Because in this case, the architecture reported that there was just one
> range of available pages with no holes.

So.. we're registering a simgle blob of pfns which includes the "reserved"
memory as well as the "ACPI data" and the "ACPI NVS" (with an apparent
off-by-one here).

How come the machine still works? I guess the architecture went and marked
those pfns reserved.

> > If so, perhaps the bug is that the x86_64 code isn't doing that. And that
> > x86 isn't doing it for some people either.
> >
>
> I'm hoping in this case that having MAX_ACTIVE_REGIONS match E820MAX will
> fix the issue on your machine.

I expect it will.

One does wonder whether it's worth all this fuss though. It's only a
24-byte structure and it's all thrown away in free_initmem(). One _could_
just go and do

#define MAX_ACTIVE_REGIONS 10000

and be happy.

> I'm still confused why Christian's failed
> to boot with the patch backed out though.

He didn't get any "Too many memory regions" messages, so it's something
different.

Maybe he hit my off-by-one on his "ACPI data"?

hm, I didn't mention this in the earlier email. On my x86 I have

BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
BIOS-e820: 000000000009bc00 - 000000000009c000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000ffc0000 (usable)
BIOS-e820: 000000000ffc0000 - 000000000fff8000 (ACPI data)
BIOS-e820: 000000000fff8000 - 0000000010000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffb80000 - 00000000ffc00000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)

I added some debug and saw that add_active_range() was getting a
start_pfn=0 and an end_pfn which corresponds with 0x0fffc000. So my "ACPI
NVS" is getting chopped off.

If Christian is seeing a similar thing then his "ACPI data" will be getting
only part-registered.

I'd suggest that the next rev be liberal in its printking. This is the
debug patch I used:

mm/page_alloc.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

diff -puN mm/page_alloc.c~a mm/page_alloc.c
--- devel/mm/page_alloc.c~a 2006-05-20 13:19:58.000000000 -0700
+++ devel-akpm/mm/page_alloc.c 2006-05-20 13:20:42.000000000 -0700
@@ -2463,22 +2463,36 @@ void __init add_active_range(unsigned in
unsigned long end_pfn)
{
unsigned int i;
- printk(KERN_DEBUG "Range (%d) %lu -> %lu\n", nid, start_pfn, end_pfn);
+
+ printk("Range (nid %d) %lu -> %lu, max %d\n",
+ nid, start_pfn, end_pfn, MAX_ACTIVE_REGIONS - 1);

/* Merge with existing active regions if possible */
for (i = 0; early_node_map[i].end_pfn; i++) {
- if (early_node_map[i].nid != nid)
+ printk("i=%d early_node_map[i].nid=%d "
+ "early_node_map[i].start_pfn=%lu "
+ "early_node_map[i].end_pfn=%lu",
+ i, early_node_map[i].nid,
+ early_node_map[i].start_pfn,
+ early_node_map[i].end_pfn);
+
+ if (early_node_map[i].nid != nid) {
+ printk(" continue 1\n");
continue;
+ }

/* Skip if an existing region covers this new one */
if (start_pfn >= early_node_map[i].start_pfn &&
- end_pfn <= early_node_map[i].end_pfn)
+ end_pfn <= early_node_map[i].end_pfn) {
+ printk(" return 1\n");
return;
+ }

/* Merge forward if suitable */
if (start_pfn <= early_node_map[i].end_pfn &&
end_pfn > early_node_map[i].end_pfn) {
early_node_map[i].end_pfn = end_pfn;
+ printk(" return 2\n");
return;
}

@@ -2486,13 +2500,16 @@ void __init add_active_range(unsigned in
if (start_pfn < early_node_map[i].end_pfn &&
end_pfn >= early_node_map[i].start_pfn) {
early_node_map[i].start_pfn = start_pfn;
+ printk(" return 3\n");
return;
}
+ printk("\n");
}

/* Leave last entry NULL, we use range.end_pfn to terminate the walk */
if (i >= MAX_ACTIVE_REGIONS - 1) {
- printk(KERN_ERR "Too many memory regions, truncating\n");
+ printk(KERN_ERR "More than %d memory regions, truncating\n",
+ MAX_ACTIVE_REGIONS - 1);
return;
}

_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/