On (30/08/06 13:57), Keith Mannthey didst pronounce:
> On 8/21/06, Mel Gorman <mel@xxxxxxxxx> wrote:
> >
ok, great. How much physical memory is installed on the machine? I want toUsable entries in the e820 contian memory. I have about 20-24gb
determine if the "usable" entries in the e820 map contain physical memory
or not.
When the SRAT is bad, the information is discarded and discovered by an
alternative method later in the boot process.
In this case, numa_initmem_init() is called after acpi_numa_init(). It
calls acpi_scan_nodes() which returns -1 because the SRAT is bad. Once
that happens, either k8_scan_nodes() will be called and the regions
discovered there or if that is not possible, it'll fall through and
e820_register_active_regions will be called without any node awareness.
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff
> >linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c
> >linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
> >--- linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c
> >2006-08-21 09:23:50.000000000 +0100
> >+++ linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
> >2006-08-21 10:15:58.000000000 +0100
> >@@ -84,6 +84,7 @@ static __init void bad_srat(void)
> > apicid_to_node[i] = NUMA_NO_NODE;
> > for (i = 0; i < MAX_NUMNODES; i++)
> > nodes_add[i].start = nodes[i].end = 0;
> >+ remove_all_active_ranges();
> > }
>
> We go back to setup_arch with no active areas?
>
Yes, and it'll be discovered using an alternative method later. There is
no point returning to setup_arch with known bad information about active
areas.
> > static __init inline int srat_disabled(void)
> >@@ -166,7 +167,7 @@ static int hotadd_enough_memory(struct b
> >
> > if (mem < 0)
> > return 0;
> >- allowed = (end_pfn - e820_hole_size(0, end_pfn)) * PAGE_SIZE;
> >+ allowed = (end_pfn - absent_pages_in_range(0, end_pfn)) *
> >PAGE_SIZE;
> > allowed = (allowed / 100) * hotadd_percent;
> > if (allocated + mem > allowed) {
> > unsigned long range;
> >@@ -238,7 +239,7 @@ static int reserve_hotadd(int node, unsi
> > }
> >
> > /* This check might be a bit too strict, but I'm keeping it for
> > now. */
> >- if (e820_hole_size(s_pfn, e_pfn) != e_pfn - s_pfn) {
> >+ if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
> > printk(KERN_ERR "SRAT: Hotplug area has existing
> > memory\n");
> > return -1;
> > }
> We really do want to to compare against the e820 map at it contains
> the memory that is really present (this info was blown away before
> acpi_numa)
The information used by absent_pages_in_range() should match what was
available to e820_hole_size().
> Anyway I fixed up to have the current chunk added
> (e820_register_active_regions) after calling this code so it logicaly
> makes sense but it still trip over the check.
> I am not sure what you
> are printing out in you debug code but dosen't look like pfns or
> phys_addresses but maybe it can tell us why the check fails.
>
My debug code for add_active_range() printing out pfns but I spotted one
case where absent_pages_in_range(I) does not do what one would expect.
Lets say the ranges with physical memory was 0->1000 and 2000-3000 (in
pfns). absent_pages_in_range(0, 3000) would return 1000 as you'd expect but
absent_pages_in_range(5000-6000) would return 0! I have a patch that might
fix this at the end of the mail but I'm not sure it's the problem you are
hitting. In the bootlog, I see;
SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 0 PXM 0 0-1070000000
SRAT: Hotplug area has existing memory
The last part (0-1070000000) is checked as a hotplug area but it's clear
that memory exists in that range. As reserve_hotadd() requires that the
whole range be a hole, I'm having trouble seeing how it ever successfully
reserved unless the ranges going into reserve_hotadd() are something other
than the pfn range for 0-1070000000). The patch later will print out the
range used by reserve_hotadd() so we can see.
> >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
> >
> > printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
> > nd->start, nd->end);
> >+ e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
> >+ nd->end >> PAGE_SHIFT);
>
> A node chunk in this section of code may be a hot-pluggable zone. With
> MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
>
The ranges should not get registered as active memory by
e820_register_active_regions() unless they are marked E820_RAM. My
understanding is that the regions for hotadd would be marked "reserved"
in the e820 map. Is that wrong?
> > if (ma->flags.hot_pluggable && !reserve_hotadd(node, start, end) <
> > 0) {
> > /* Ignore hotadd region. Undo damage */
>
> I have but the e820_register_active_regions as a else to this
> statment the absent pages check fails.
>
The patch below omits this change because I think
e820_register_active_regions() will still have got called by the time
you encounter a hotplug area.
> Also nodes_cover_memory and alot of these check were based against
> comparing the srat data against the e820. Now all this code is
> comparing SRAT against SRAT....
>
I don't see why. The SRAT table passes a range to
e820_register_active_regions() so should be comparing SRAT to e820
> I am willing to help here but we should compare the SRAT against to
> e820. Table v. Table.
>
> What to you think should be done?
>
Can you read through this patch and see does it address the problem in any
way? If it doesn't, can you send a complete bootlog so I can see what is
being sent to reserve_hotadd()? Thanks