Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
From: Keith Mannthey
Date: Thu Aug 31 2006 - 23:06:14 EST
On 8/31/06, Mel Gorman <mel@xxxxxxxxx> wrote:
On Thu, 31 Aug 2006, Keith Mannthey wrote:
> On 8/31/06, Mel Gorman <mel@xxxxxxxxx> wrote:
>> On (30/08/06 13:57), Keith Mannthey didst pronounce:
>> > On 8/21/06, Mel Gorman <mel@xxxxxxxxx> wrote:
>> > >
Can you confirm that happens by applying the patch I sent to you and
checking the output? When the reserve fails, it should print out what
range it actually checked. I want to be sure it's not checking the
addresses 0->0x1070000000
See below
>> > >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
>> > >
>> > > printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
>> > > nd->start, nd->end);
>> > >+ e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
>> > >+ nd->end >> PAGE_SHIFT);
>> >
>> > A node chunk in this section of code may be a hot-pluggable zone. With
>> > MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
>> >
>>
>> The ranges should not get registered as active memory by
>> e820_register_active_regions() unless they are marked E820_RAM. My
>> understanding is that the regions for hotadd would be marked "reserved"
>> in the e820 map. Is that wrong?
>
> This is wrong. In a mult-node system that last node add area will not
> be marked reserved by the e820. The e820 only defines memory <
> end_pfn. the last node add area is > end_pfn.
>
ok, that should still be fine. As long as the ranges are not marked
"usable", add_active_range() will not be called and the holes should be
counted correctly with the patch I sent you.
> With RESERVE based add-memory you want the add-areas repored by the
> srat to be setup during boot like all the other pages.
>
So, do you actally expect a lot of unused mem_map to be allocated with
struct pages that are inactive until memory is hot-added in an
x86_64-specific manner? The arch-independent stuff currently will not do
that. It sets up memmap for where memory really exists. If that is not
what you expect, it will hit issues at hotadd time which is not the
current issue but one that can be fixed.
Yes. RESERVED based is a big waste of mem_map space. The add areas
are marked as RESERVED during boot and then later onlined during add.
It might be ok. I will play with tomorrow. I might just need to
call add_active_range in the right spot :)
>> > > if (ma->flags.hot_pluggable && !reserve_hotadd(node, start, end)
>> <
>> > > 0) {
>> > > /* Ignore hotadd region. Undo damage */
>> >
>> > I have but the e820_register_active_regions as a else to this
>> > statment the absent pages check fails.
>> >
>>
>> The patch below omits this change because I think
>> e820_register_active_regions() will still have got called by the time
>> you encounter a hotplug area.
>
> called but then removed in setup arch.
By "removed", I assume you mean the active regions removed by the call
to remove_all_active_regions() in setup_arch(). Before reserve_hotadd() is
called, e820_register_active_regions() will have reregistered the active
regions with the NUMA node id.
I see e820_register_active_regions is acting as a filter against the e820
>> > Also nodes_cover_memory and alot of these check were based against
>> > comparing the srat data against the e820. Now all this code is
>> > comparing SRAT against SRAT....
>> >
>>
>> I don't see why. The SRAT table passes a range to
>> e820_register_active_regions() so should be comparing SRAT to e820
>
> let me go off and look at e820_register_active_regions() some more.
Things get clear :)
Should be ok.
> Sure thing. It is just the hot-add area I am guessing it is an off by
> one error of some sort.
>
See below. I do my e820_register_active_area as an else to to if
(hotplug.....!reserve) and the prink is easy to sort out.
I see your pfn are in base 10. Looks like it considers the last
addres to be a present page. (off by one thing).
Thanks,
Keith
Output below
disabling early console
Linux version 2.6.18-rc4-mm3-smp (root@elm3a153) (gcc version 4.1.0
(SUSE Linux)) #6 SMP Thu Aug 31 22:06:00 EDT 2006
Command line: root=/dev/sda3
ip=9.47.66.153:9.47.66.169:9.47.66.1:255.255.255.0 resume=/dev/sda2
showopts earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
debug numa=hotadd=100
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used
end_pfn_map = 18219008
DMI 2.3 present.
ACPI: RSDP (v000 IBM ) @ 0x00000000000fdcf0
ACPI: RSDT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98800
ACPI: FADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98780
ACPI: MADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98600
ACPI: SRAT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff983c0
ACPI: HPET (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98380
ACPI: SSDT (v001 IBM VIGSSDT0 0x00001000 INTL 0x20030122) @
0x000000007ff90780
ACPI: SSDT (v001 IBM VIGSSDT1 0x00001000 INTL 0x20030122) @
0x000000007ff88bc0
ACPI: DSDT (v001 IBM EXA01ZEU 0x00001000 INTL 0x20030122) @
0x0000000000000000
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 1 -> APIC 64 -> Node 1
SRAT: PXM 1 -> APIC 65 -> Node 1
SRAT: PXM 1 -> APIC 66 -> Node 1
SRAT: PXM 1 -> APIC 67 -> Node 1
SRAT: PXM 1 -> APIC 102 -> Node 1
SRAT: PXM 1 -> APIC 103 -> Node 1
SRAT: PXM 1 -> APIC 100 -> Node 1
SRAT: PXM 1 -> APIC 101 -> Node 1
SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 0 PXM 0 0-1070000000
reserve_hotadd called with node 0 sart 470000000 end 1070000000
SRAT: Hotplug area has existing memory
Entering add_active_range(0, 0, 152) 3 entries of 3200 used
Entering add_active_range(0, 256, 524165) 3 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 3 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-1160000000
Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-3200000000
reserve_hotadd called with node 1 sart 1160000000 end 3200000000
SRAT: Hotplug area has existing memory
Entering add_active_range(1, 17235968, 18219008) 4 entries of 3200 used
NUMA: Using 28 for the hash shift.
Bootmem setup node 0 0000000000000000-0000001070000000
Bootmem setup node 1 0000001070000000-0000001160000000
Zone PFN ranges:
DMA 0 -> 4096
DMA32 4096 -> 1048576
Normal 1048576 -> 18219008
early_node_map[4] active PFN ranges
0: 0 -> 152
0: 256 -> 524165
0: 1048576 -> 4653056
1: 17235968 -> 18219008
On node 0 totalpages: 4128541
0 pages used for SPARSE memmap
1149 pages DMA reserved
DMA zone: 2843 pages, LIFO batch:0
0 pages used for SPARSE memmap
DMA32 zone: 520069 pages, LIFO batch:31
0 pages used for SPARSE memmap
Normal zone: 3604480 pages, LIFO batch:31
On node 1 totalpages: 983040
0 pages used for SPARSE memmap
0 pages used for SPARSE memmap
0 pages used for SPARSE memmap
Normal zone: 983040 pages, LIFO batch:31
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/