[tip:x86/apic] x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation

From: tip-bot for Yinghai Lu
Date: Thu Dec 23 2010 - 18:22:29 EST


Commit-ID: d3bd058826aa8b79590cca6c8e6d1557bf576ada
Gitweb: http://git.kernel.org/tip/d3bd058826aa8b79590cca6c8e6d1557bf576ada
Author: Yinghai Lu <yinghai@xxxxxxxxxx>
AuthorDate: Thu, 16 Dec 2010 19:09:58 -0800
Committer: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx>
CommitDate: Thu, 23 Dec 2010 13:16:18 -0800

x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation

Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.

If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].

for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...

[ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[ 9.235021] divide error: 0000 [#1] SMP
[ 9.235315] last sysfs file:
[ 9.235481] CPU 1
[ 9.235592] Modules linked in:
[ 9.245398]
[ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800
[ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
...
[ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[ 9.665356] RSP <ffff88103f8d1c40>
[ 9.665568] ---[ end trace 2296156d35fdfc87 ]---

So let just parse all cpu entries in SRAT.

Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].

it fixes following bug too.
https://bugzilla.kernel.org/show_bug.cgi?id=22662

-v2: expand to 32bit according to hpa
need to add MAX_LOCAL_APIC for 32bit

Reported-and-Tested-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
Reported-by: Bjorn Helgaas <bjorn.helgaas@xxxxxx>
Tested-by: Myron Stowe <myron.stowe@xxxxxx>
Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
LKML-Reference: <4D0AD486.9020704@xxxxxxxxxx>
Signed-off-by: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx>
---
arch/x86/kernel/acpi/boot.c | 5 +++++
arch/x86/mm/srat_32.c | 1 +
arch/x86/mm/srat_64.c | 10 ++++++++++
drivers/acpi/numa.c | 14 ++++++++++++--
4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index c05872a..f19d667 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -198,6 +198,11 @@ static void __cpuinit acpi_register_lapic(int id, u8 enabled)
{
unsigned int ver = 0;

+ if (id >= (MAX_LOCAL_APIC-1)) {
+ printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
+ return;
+ }
+
if (!enabled) {
++disabled_cpus;
return;
diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c
index a17dffd..f164345 100644
--- a/arch/x86/mm/srat_32.c
+++ b/arch/x86/mm/srat_32.c
@@ -92,6 +92,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *cpu_affinity)
/* mark this node as "seen" in node bitmap */
BMAP_SET(pxm_bitmap, cpu_affinity->proximity_domain_lo);

+ /* don't need to check apic_id here, because it is always 8 bits */
apicid_to_pxm[cpu_affinity->apic_id] = cpu_affinity->proximity_domain_lo;

printk(KERN_DEBUG "CPU %02x in proximity domain %02x\n",
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a35cb9d..171a0aa 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
}

apic_id = pa->apic_id;
+ if (apic_id >= MAX_LOCAL_APIC) {
+ printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+ return;
+ }
apicid_to_node[apic_id] = node;
node_set(node, cpu_nodes_parsed);
acpi_numa = 1;
@@ -168,6 +172,12 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
else
apic_id = pa->apic_id;
+
+ if (apic_id >= MAX_LOCAL_APIC) {
+ printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+ return;
+ }
+
apicid_to_node[apic_id] = node;
node_set(node, cpu_nodes_parsed);
acpi_numa = 1;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 5718566..d9926af 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_type id,
int __init acpi_numa_init(void)
{
int ret = 0;
+ int nr_cpu_entries = nr_cpu_ids;
+
+#ifdef CONFIG_X86
+ /*
+ * Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
+ * SRAT cpu entries could have different order with that in MADT.
+ * So go over all cpu entries in SRAT to get apicid to node mapping.
+ */
+ nr_cpu_entries = MAX_LOCAL_APIC;
+#endif

/* SRAT: Static Resource Affinity Table */
if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
- acpi_parse_x2apic_affinity, nr_cpu_ids);
+ acpi_parse_x2apic_affinity, nr_cpu_entries);
acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
- acpi_parse_processor_affinity, nr_cpu_ids);
+ acpi_parse_processor_affinity, nr_cpu_entries);
ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity,
NR_NODE_MEMBLKS);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/