Re: [patch] sched: auto-tune migration costs [was: Re: Industry dbbenchmark result on recent 2.6 kernels]

From: Paul Jackson
Date: Mon Apr 04 2005 - 12:31:26 EST


Ingo wrote:
> i've attached the latest snapshot.

I ran your latest snapshot on 64 CPU (well, 62 - one node wasn't
working) system. I made one change - chop the matrix lines at 8 terms.
It's a hack - don't know if it's a good idea. But the long lines were
hard to read (and would only get worse on a 512). And I had a fear,
probably unfounded, that the long lines could slow things down.

It built and ran fine, exactly as provided, against 2.6.12-rc1-mm4. I
probably have the unchopped matrix output in my screenlog file, if you
want it. Though, given that the matrix is more or less symmetric, I
wasn't seeing much value in the part I chopped.

It took 24 seconds - a little painful, but booting this system takes
a few minutes, so 24 seconds is not fatal - just painful.

The maximum finding code - to stop scanning after the max has been
passed, works fine. If it had been (impossibly) perfect, stopping right
at the max, it would have been perhaps 30% faster, so there is not a
huge amount to be gained from trying to fine tune the scan termination
logic.

I can imagine that one could trim this time by doing a couple of scans,
the first one at lower density (perhaps just one out of four sizes
considered), then the second scan at full density, around the maximum
found by the first. However this would be less robust, and yet more
logic.

Or perhaps, long shot, one could get fancy with some parameterized curve
fitting. If some equation is a reasonably fit for the function being
sampled here, then just a low density scan through the max could be used
to estimate the co-efficients of whatever the equation was, and the
equation used to find the maximum, instead of the samples. This would
be fun to play with, but I can't now - other duties are calling.

The one change:

diff -Naurp auto-tune_migration_costs/kernel/sched.c auto-tune_migration_costs_chopped/kernel/sched.c
--- auto-tune_migration_costs/kernel/sched.c 2005-04-04 09:11:43.000000000 -0700
+++ auto-tune_migration_costs_chopped/kernel/sched.c 2005-04-04 09:11:22.000000000 -0700
@@ -5287,6 +5287,7 @@ void __devinit calibrate_migration_costs
distance = domain_distance(cpu1, cpu2);
max_distance = max(max_distance, distance);
cost = migration_cost[distance];
+ if (cpu2 < 8)
printk(" %2ld.%ld(%ld)", (long)cost / 1000000,
((long)cost / 100000) % 10, distance);
}

With this change, the output was:

Memory: 243350592k/244270096k available (7182k code, 921216k reserved, 3776k data, 368k init)
McKinley Errata 9 workaround not needed; disabling it
Dentry cache hash table entries: 33554432 (order: 14, 268435456 bytes)
Inode-cache hash table entries: 16777216 (order: 13, 134217728 bytes)
Mount-cache hash table entries: 1024
Boot processor id 0x0/0x40
Brought up 62 CPUs
Total of 62 processors activated (138340.68 BogoMIPS).
-> [0][2][3145728] 12.3 [ 12.3] (1): (12361880 6180940)
-> [0][2][3311292] 13.1 [ 13.1] (1): (13175591 3497325)
-> [0][2][3485570] 13.7 [ 13.7] (1): (13718647 2020190)
-> [0][2][3669021] 14.3 [ 14.3] (1): (14356800 1329171)
-> [0][2][3862127] 15.5 [ 15.5] (1): (15522156 1247263)
-> [0][2][4065396] 16.4 [ 16.4] (1): (16487934 1106520)
-> [0][2][4279364] 17.3 [ 17.3] (1): (17356154 987370)
-> [0][2][4504593] 18.1 [ 18.1] (1): (18144452 887834)
-> [0][2][4741676] 18.9 [ 18.9] (1): (18934638 839010)
-> [0][2][4991237] 19.9 [ 19.9] (1): (19965884 935128)
-> [0][2][5253933] 21.0 [ 21.0] (1): (21067441 1018342)
-> [0][2][5530455] 22.3 [ 22.3] (1): (22303727 1127314)
-> [0][2][5821531] 23.4 [ 23.4] (1): (23453867 1138727)
-> [0][2][6127927] 23.4 [ 23.4] (1): (23406625 592984)
-> [0][2][6450449] 23.5 [ 23.5] (1): (23586123 386241)
-> [0][2][6789946] 23.5 [ 23.5] (1): (23519823 226270)
-> [0][2][7147311] 22.6 [ 23.5] (1): (22619385 563354)
-> [0][2][7523485] 21.9 [ 23.5] (1): (21998024 592357)
-> [0][2][7919457] 20.7 [ 23.5] (1): (20705771 942305)
-> [0][2][8336270] 17.2 [ 23.5] (1): (17244361 2201857)
-> [0][2][8775021] 14.6 [ 23.5] (1): (14644331 2400943)
-> found max.
[0][2] working set size found: 6450449, cost: 23586123
-> [0][32][3145728] 17.8 [ 17.8] (2): (17848927 8924463)
-> [0][32][3311292] 18.8 [ 18.8] (2): (18811236 4943386)
-> [0][32][3485570] 19.7 [ 19.7] (2): (19779337 2955743)
-> [0][32][3669021] 20.8 [ 20.8] (2): (20811634 1994020)
-> [0][32][3862127] 21.9 [ 21.9] (2): (21919806 1551096)
-> [0][32][4065396] 23.0 [ 23.0] (2): (23075814 1353552)
-> [0][32][4279364] 24.2 [ 24.2] (2): (24267691 1272714)
-> [0][32][4504593] 25.5 [ 25.5] (2): (25546809 1275916)
-> [0][32][4741676] 26.8 [ 26.8] (2): (26886375 1307741)
-> [0][32][4991237] 28.2 [ 28.2] (2): (28291601 1356483)
-> [0][32][5253933] 29.5 [ 29.5] (2): (29587239 1326060)
-> [0][32][5530455] 30.6 [ 30.6] (2): (30669228 1204024)
-> [0][32][5821531] 30.9 [ 30.9] (2): (30969069 751932)
-> [0][32][6127927] 30.3 [ 30.9] (2): (30353322 683839)
-> [0][32][6450449] 29.3 [ 30.9] (2): (29381521 827820)
-> [0][32][6789946] 27.4 [ 30.9] (2): (27459958 1374691)
-> [0][32][7147311] 26.4 [ 30.9] (2): (26403308 1215670)
-> [0][32][7523485] 23.9 [ 30.9] (2): (23967782 1825598)
-> [0][32][7919457] 19.4 [ 30.9] (2): (19483305 3155037)
-> found max.
[0][32] working set size found: 5821531, cost: 30969069
---------------------
| migration cost matrix (max_cache_size: 6291456, cpu: -1 MHz):
---------------------
[00] [01] [02] [03] [04] [05] [06] [07] [08] [09] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61]
[00]: - 0.0(0) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)
[01]: 0.0(0) - 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)
[02]: 47.1(1) 47.1(1) - 0.0(0) 47.1(1) 47.1(1) 47.1(1) 47.1(1)
[03]: 47.1(1) 47.1(1) 0.0(0) - 47.1(1) 47.1(1) 47.1(1) 47.1(1)
[04]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) - 0.0(0) 47.1(1) 47.1(1)
[05]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 0.0(0) - 47.1(1) 47.1(1)
[06]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - 0.0(0)
[07]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 0.0(0) -
[08]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[09]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[10]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[11]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[12]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[13]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[14]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[15]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[16]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[17]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[18]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[19]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[20]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[21]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[22]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[23]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[24]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[25]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[26]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[27]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[28]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[29]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[30]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[31]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) -
[32]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[33]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[34]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[35]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[36]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[37]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[38]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[39]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[40]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[41]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[42]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[43]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[44]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[45]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[46]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[47]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) -
[48]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[49]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[50]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[51]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[52]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[53]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[54]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[55]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[56]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[57]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[58]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[59]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[60]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
[61]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) -
--------------------------------
| cacheflush times [3]: 0.0 (-1) 47.1 (47172246) 61.9 (61938138)
| calibration delay: 24 seconds
--------------------------------

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxxxxxxx> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/