Re: [PATCH Bug fix] acpi, movablemem_map: node0 should always beunhotpluggable when using SRAT.

From: Tang Chen
Date: Wed Jan 30 2013 - 05:32:55 EST


Hi David,

On 01/30/2013 05:45 PM, David Rientjes wrote:
On Wed, 30 Jan 2013, Tang Chen wrote:

The failure I'm trying to fix is that if all the memory is hotpluggable, and
user
specified movablemem_map, my code will set all the memory as ZONE_MOVABLE, and
kernel
will fail to allocate any memory, and it will fail to boot.


I'm curious, do you have a dmesg of the failure?

Historically I've seen this panic as late as build_sched_domains()
because of a bad mapping between pxms and apicids that assumes node 0 is
online and results in node_distance() being inaccurate. I'm not sure if
you're even getting that far in boot?

I'm sorry I cannot provide you any dmesg. I am using a remote machine and if
it failed to boot very early, it will redirect nothing to me.

So I think I didn't go that far.


Are you saying your memory is not on node0, and your physical address
0x0 is not on node0 ? And your /sys fs don't have a node0 interface, it is
node1 or something else ?


Exactly, there is a node 0 but it includes no online memory (and that
should be the case as if it was solely hotpluggable memory) at the time of
boot. The sysfs interfaces only get added if the memory is onlined later.

OK, you mean you have only node1 at first and no node0 interface, right?
If so, then this patch is wrong. :)

But you mean physical address 0x0 is on your node1, right? Otherwise, how could
the kernel be loaded ?

Could you provide the dmesg of your box like this:

[ 0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
[ 0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0x7ffffffff]
[ 0.000000] SRAT: Node 1 PXM 2 [mem 0x1000000000-0x17ffffffff] Hot Pluggable
[ 0.000000] SRAT: Node 2 PXM 3 [mem 0x1800000000-0x1fffffffff] Hot Pluggable
[ 0.000000] SRAT: Node 3 PXM 4 [mem 0x2000000000-0x27ffffffff]
[ 0.000000] SRAT: Node 4 PXM 5 [mem 0x2800000000-0x2fffffffff]
[ 0.000000] SRAT: Node 5 PXM 6 [mem 0x3000000000-0x37ffffffff]
[ 0.000000] SRAT: Node 6 PXM 7 [mem 0x3800000000-0x3fffffffff]
[ 0.000000] SRAT: Node 7 PXM 1 [mem 0x800000000-0xfffffffff]


If so, I think I'd better find another way to fix this problem because node0
may not be
the first node on the system.


I haven't tried it over the past year or so, but this used to work in the
past. I think if we had some more information we'd be able to see if we
really need to treat node 0 in a special way.

I'll try to do more investigation and find a better way to fix it. :)

Thanks. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/