On 2015/10/3 3:12, Denys Vlasenko wrote:
From: Daniel J Blueman <daniel@xxxxxxxxxxxxx>Hi Denys and Daniel,
The Intel x2APIC spec states the upper 16-bits of APIC ID is the
cluster ID [1, p2-12], intended for future distributed systems. Beyond
the legacy 8-bit APIC ID, Numascale NumaConnect uses 4-bits for the
position of a server on each axis of a multi-dimension torus; SGI
NUMAlink also structures the APIC ID space.
Instead, define an array based on NR_CPUs to achieve a 1:1 mapping and
perform linear search; this addresses the binary bloat and the present
artificial APIC ID limits. With CONFIG_NR_CPUS=256:
$ size vmlinux vmlinux-patched
text data bss dec hex filename
18232877 1849656 2281472 22364005 1553f65 vmlinux
18233034 1786168 2281472 22300674 1544802 vmlinux-patched
That is, ~64 kbytes less data.
Works peachy on a 256-core system with a 20-bit APIC ID space, and on a
48-core legacy 8-bit APIC ID system. If we care, I can make
numa_cpu_node O(1) lookup for typical cases.
Signed-off-by: Daniel J Blueman <daniel@xxxxxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxxxxx>
CC: Daniel J Blueman <daniel@xxxxxxxxxxxxx>
CC: Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx>
CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CC: Len Brown <len.brown@xxxxxxxxx>
CC: x86@xxxxxxxxxx
CC: linux-kernel@xxxxxxxxxxxxxxx
[1]
http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
---
I added forgotten change in arch/x86/mm/numa_emulation.c (Denys)
arch/x86/include/asm/numa.h | 13 +++++++------
arch/x86/kernel/cpu/amd.c | 8 ++++----
arch/x86/mm/numa.c | 31 +++++++++++++++++++++++--------
arch/x86/mm/numa_emulation.c | 6 +++---
4 files changed, 37 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index c2ecfd0..33becb8 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -17,6 +17,11 @@
*/
#define NODE_MIN_SIZE (4*1024*1024)
+struct apicid_to_node {
+ int apicid;
+ s16 node;
+};
+
extern int numa_off;
/*
@@ -27,17 +32,13 @@ extern int numa_off;
* should be accessed by the accessors - set_apicid_to_node() and
* numa_cpu_node().
*/
-extern s16 __apicid_to_node[MAX_LOCAL_APICID];
+extern struct apicid_to_node __apicid_to_node[NR_CPUS];
I still have some concerns about limiting the array to NR_CPUS.
__apicid_to_node are populated according to the order that CPUs are
listed in ACPI SRAT table. And CPU IDs are allocated according to the
order that CPUs are listed in ACPI MADT(APIC) order. So it may cause
trouble if:
1) system has more than NR_CPUS CPUs
2) CPUs are listed in different order in SRAT and MADT tables.