(no subject)

From: Hubertus Franke (frankeh@watson.ibm.com)
Date: Fri Aug 09 2002 - 06:22:08 EST

Next message: Albert D. Cahalan: "Re: [PATCH] Linux-2.5 fix/improve get_pid()"
Previous message: George France: "Re: kbuild 2.5.26 - arch/alpha"
Next in thread: Andries Brouwer: "Re: Analysis for Linux-2.5 fix/improve get_pid(), comparing various approaches"
Reply: Andries Brouwer: "Re: Analysis for Linux-2.5 fix/improve get_pid(), comparing various approaches"
Reply: Linus Torvalds: "Re: Analysis for Linux-2.5 fix/improve get_pid(), comparing various approaches"
Maybe reply: Samium Gromoff: "(no subject)"
Maybe reply: peta huh: "(no subject)"
Maybe reply: peta huh: "(no subject)"
Maybe reply: Jeff Chua: "(no subject)"
Maybe reply: Bernard yap: "(no subject)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Folks, below is the analysis that I promised yesterday.
Attached is also the harness program that brings this into userspace and
computes basic overhead for pid allocation in a random setting.
The tar file contains the following stuff and represents the
status last time I gave this consideration. I had posted it to
lkml but other than from Andrea I had not received any feedback
and dropped the issue.

total 52
   4 -rw-rw-r-- 1 frankeh frankeh 141 Mar 28 11:25 Makefile
  12 -rw-rw-r-- 1 frankeh frankeh 8306 Mar 22 16:59 res-2
  16 -rw-rw-r-- 1 frankeh frankeh 13268 Mar 20 10:11 getpid1.c
  16 -rw-rw-r-- 1 frankeh frankeh 15124 Mar 14 18:19 res-1
   4 -rwxrw-r-- 1 frankeh frankeh 316 Mar 13 12:01 bm

getpid1 is the harness. bm is the batch driver.
res-1 and res-2 are two result files that each were mangled together from
the outputs of <bm> executed on different machines.
I attach res-1 here , which I posted earlier in March, so you can read through
it and draw your own conclusions with respect on where we should go with
this.
It might be worthwile to independenly redo the test and also include
Andrea's <getpid> there, allthough it resembles <algo-1>.

Volunteers ?

Note that based on the results I got, I still favor a version of the
mark and sweep that continues to go forward to find the next range
rather than always start from the beginning again.
I am not really sure whether Bill's version can be easily integrated.

The second part of the message (res-1) also experiments with a partial
maximum safe pid-range, i.e., once a range of 256 free pids has been
established we stop the mark and sweep. That looks very competitive
as well. I gives too simple improvements
(a) bitmask can be located on the stack
(b) we could potentially deal with 32-bit pid numbers as we can limit
the bitmask to partial space of the pid range.

-- Hubertus

----------------<previous post>-----------------------------------------------

I implemented an alternative version of getpid, that for large thread counts
( > 210000), provides "significantly" better performance as shown in attached
performance analysis. This is particulary viable for PID_MAX=32768.

-- Hubertus Franke <frankeh@watson.ibm.com>

---------------------------------------------------------------------------------

Currently the getpid algorithm works as follows:
At any given time an interval of [ last_pid .. next_safe ) is known to hold
unused pids. Initially the interval is set to [0 .. 32767]

Hence to allocate a new pid the following is sufficient:

if (++last_pid < next_safe) return last_pid;

However, if we move out of the safe interval, the next safe interval needs
to be established first.
This is currently done by a repetive search

repeat:
     foralltasks(p) {
        if (p uses lastpid) { last_pid++; goto repeat; }
        /* narrow [ last_pid .. next_safe ) */
        if (p->pids in [ last_pid .. next_safe ) ) next_safe = p->pid
     }

Particulary for large number of tasks, this can lead to frequent exercise of
the repeat resulting in a O(N^2) algorithm. We call this : <algo-0>.

Instead I have provided an alternative mechanism that at the time
of determining the next interval marks a bitmask by walking the tasklist
once [ O(N) ] and then finding the proper bit offsets to mark the next free
interval starting from last_pid. The bitmap requires 4096 bytes.
This is <algo-1>.

An optimization to this to keep the last bitmap instead of clearing it
with every search. Only if we fail to obtain a free range, then we have
to go back and clear the bitmap and redo the search one more time.
This is <algo-2>.

I dragged the various algorithms into a userlevel test program to figure
out where the cut off points are with PID_MAX=32768. In this testprogram
I maintain A tasks, and for 10 rounds (delete D random tasks and
reallocate D tasks again) resulting in T=10*D total measured allocations.

Si states how many interval searches where needed for algo-i.
Gi states the average overhead per get_pid for algo-i in usecs.

Based on that one should use the current algorithm until ~ 22K tasks and
beyond that use algo-2. Only the last 15 tasks are a bit faster under algo-1.
We can safely ignore that case.

Based on that providing an adaptive implementation seems the right choice.
The patch for 2.5.7-pre1 is attached.

executed program example: getpid -c 2 -s 10 -e 100 -d 10 -t <0,1>
0 is old 1 is new algo 2.

   A D T | S0 G0 | S1 G1 | S2 G2
----------------------------------------------------------------------------
   10 10 80 | 1 0.34 | 1 0.59 | 1 0.81
   20 10 100 | 1 0.30 | 1 0.49 | 1 0.64
   30 10 100 | 1 0.29 | 1 0.55 | 1 0.65
   40 10 100 | 1 0.35 | 1 0.51 | 1 0.65
   50 10 100 | 1 0.35 | 1 0.54 | 1 0.67
   60 10 100 | 2 0.38 | 21 1.95 | 2 0.79
   70 10 100 | 1 0.39 | 1 0.59 | 1 0.76
   80 10 100 | 1 0.41 | 1 0.62 | 1 0.76
  100 50 500 | 2 0.22 | 63 1.26 | 2 0.30
  150 50 500 | 3 0.24 | 12 0.56 | 4 0.36
  200 50 500 | 3 0.27 | 56 2.26 | 5 0.46
  250 50 500 | 2 0.26 | 119 5.63 | 6 0.54
  300 50 500 | 3 0.32 | 148 8.73 | 9 0.76
  350 50 500 | 5 0.45 | 168 11.51 | 6 0.76
  400 50 500 | 4 0.44 | 90 7.28 | 10 1.10
  450 50 500 | 6 0.61 | 143 13.08 | 7 0.97
  500 50 500 | 6 0.65 | 100 10.47 | 7 1.06
  550 50 500 | 5 0.63 | 71 8.10 | 9 1.34
  600 50 500 | 7 0.86 | 115 14.32 | 14 2.04
  650 50 500 | 8 1.00 | 112 15.08 | 13 2.07
  700 50 500 | 8 1.06 | 127 18.12 | 10 1.79
  750 50 500 | 8 1.26 | 62 9.73 | 15 2.73
  800 50 500 | 11 1.68 | 92 15.14 | 12 2.42
  850 50 500 | 14 2.03 | 78 13.73 | 13 2.67
  900 50 500 | 21 3.17 | 102 18.74 | 27 5.18
1000 1000 9980 | 1 0.18 | 4 0.19 | 1 0.18
2000 1000 10000 | 76 1.22 | 3604 53.03 | 322 4.81
3000 1000 10000 | 161 3.84 | 4502 112.24 | 606 15.49
4000 1000 10000 | 359 11.17 | 4912 183.37 | 901 33.76
5000 1000 10000 | 539 23.33 | 4949 257.35 | 1165 59.91
6000 1000 10000 | 724 43.42 | 4918 349.59 | 1498 104.36
7000 1000 10000 | 1026 85.38 | 4886 447.58 | 1835 165.08
8000 1000 10000 | 1228 126.45 | 4870 565.29 | 2084 234.73
9000 1000 10000 | 1516 193.62 | 4826 699.85 | 2489 354.27
10000 1000 10000 | 1818 289.32 | 4910 843.32 | 2763 472.47
11000 1000 10000 | 2093 389.33 | 5005 1023.08 | 3095 629.70
12000 1000 10000 | 2305 506.23 | 5095 1194.71 | 3277 773.06
13000 1000 10000 | 2680 683.66 | 5289 1424.81 | 3711 1003.67
14000 1000 10000 | 2959 853.10 | 5358 1602.05 | 3878 1172.70
15000 1000 10000 | 3167 1037.79 | 5539 1835.64 | 4301 1436.40
16000 1000 10000 | 3466 1272.80 | 5669 2087.03 | 4485 1635.48
17000 1000 10000 | 3743 1539.06 | 5932 2338.50 | 4844 1924.27
18000 1000 10000 | 4069 1869.63 | 6097 2613.60 | 5218 2232.52
19000 1000 10000 | 4293 2183.98 | 6242 2866.34 | 5501 2519.60
20000 1000 10000 | 4616 2607.10 | 6508 3175.90 | 5770 2823.98
21000 1000 10000 | 4974 3119.34 | 6700 3460.95 | 6161 3183.73
22000 1000 10000 | 5177 3609.28 | 6944 3788.19 | 6389 3492.97 =
23000 1000 10000 | 5483 4214.03 | 7183 4136.25 | 6665 3823.38
24000 1000 10000 | 5838 4971.60 | 7404 4460.62 | 6982 4199.61
25000 1000 10000 | 6183 5880.92 | 7736 4891.80 | 7209 4546.18
26000 1000 10000 | 6413 6829.07 | 7890 5210.85 | 7533 4939.12
27000 1000 10000 | 6748 8132.96 | 8148 5598.19 | 7959 5442.25
28000 1000 10000 | 7139 10065.52 | 8445 6047.42 | 8140 5767.13
29000 1000 10000 | 7638 12967.20 | 8736 6475.23 | 8501 6250.86
30000 1000 10000 | 8178 16991.05 | 8994 6907.40 | 8911 6791.97
32000 50 500 | 482 26446.69 | 488 7405.63 | 487 7494.39
32050 50 500 | 488 34769.89 | 488 7463.11 | 486 7541.61
32100 50 500 | 489 44564.86 | 493 7593.99 | 486 7589.02
32150 50 500 | 486 58150.58 | 487 7549.96 | 492 7731.18
32200 50 500 | 490 64875.38 | 495 7721.82 | 497 7854.59
32250 50 500 | 491 81790.21 | 491 7697.57 | 490 7795.12
32300 50 500 | 489 88975.62 | 493 7763.04 | 495 7909.77
32350 50 500 | 489 115797.38 | 492 7782.34 | 495 7967.86
32400 50 500 | 490 120958.50 | 497 7898.45 | 496 8018.98
32450 50 500 | 492 147541.84 | 493 7874.27 | 492 7982.34
32500 50 500 | 493 175498.39 | 495 7940.18 | 495 8060.97
32550 50 500 | 492 207229.29 | 496 7973.88 | 498 8134.02
32600 50 500 | 495 267057.05 | 498 8028.86 | 498 8171.97
32650 50 500 | 492 375722.28 | 500 8088.30 | 498 8213.85
32700 50 500 | 497 528321.07 | 500 8110.51 | 499 8267.67
32751 1 10 | 10 259785.80 | 10 7851.50 | 10 8549.30
32752 1 10 | 10 1121285.60 | 10 7846.30 | 10 8556.10
32753 1 10 | 10 383729.50 | 10 7848.60 | 10 8562.20
32754 1 10 | 10 1061467.50 | 10 7849.80 | 10 8564.40
32755 1 10 | 10 612726.50 | 10 7853.00 | 10 8553.90
32756 1 10 | 10 1725559.90 | 10 7851.90 | 10 8553.00
32757 1 10 | 10 1679818.50 | 10 7847.80 | 10 8552.10
32758 1 10 | 10 2991838.60 | 10 7865.70 | 10 8557.20
32759 1 10 | 10 883388.90 | 10 7859.40 | 10 8562.00
32760 1 10 | 10 4830405.90 | 10 7850.50 | 10 9336.60
32761 1 10 | 10 7105809.20 | 10 7863.90 | 10 9337.20
32762 1 10 | 10 7919703.40 | 10 7867.10 | 10 9340.70
32763 1 10 | 10 1537522.50 | 10 7869.40 | 10 9340.70
32764 1 10 | 10 6173019.20 | 10 7866.60 | 10 9340.00
32765 1 10 | 10 8104105.00 | 10 7876.20 | 10 10112.80
32766 1 10 | 10 16145415.40 | 10 7880.80 | 10 10893.50
32767 1 10 | 10 16135267.10 | 10 7878.60 | 10 11674.40

Other variants are possible, for instance if 4096 bytes is too much
(hell I don't know how that could be), one can break it up into smaller
search chunks (e.g. 256 bytes).

Another alternative is to allocate the page on the first occasion of
getting into get_pid_my_map....

In the following I give a comparative result between algo-2 and
algo-2 with a max interval size of 256. The times are very comparative.
Also the search count values are identical, but in 2 cases suggesting
that a interval size particular for large thread counts of 256 is certainly
sufficient, but it brings some small overhead. Question to answer is
wether setting aside an extra page is such a crime.....

   A D T | S2 G2 | S2-256 G2-256
-------------------------------------------------------
   10 10 80 | 1 0.81 | 1 0.84
   20 10 100 | 1 0.64 | 1 0.67
   30 10 100 | 1 0.65 | 1 0.68
   40 10 100 | 1 0.65 | 1 0.69
   50 10 100 | 1 0.67 | 1 0.71
   60 10 100 | 2 0.79 | 2 0.82
   70 10 100 | 1 0.76 | 1 0.76
   80 10 100 | 1 0.76 | 1 0.79
  100 50 500 | 2 0.30 | 2 0.31
  150 50 500 | 4 0.36 | 5 0.39 <=
  200 50 500 | 5 0.46 | 5 0.46
  250 50 500 | 6 0.54 | 6 0.55
  300 50 500 | 9 0.76 | 9 0.76
  350 50 500 | 6 0.76 | 6 0.75
  400 50 500 | 10 1.10 | 10 1.10
  450 50 500 | 7 0.97 | 7 0.97
  500 50 500 | 7 1.06 | 7 1.06
  550 50 500 | 9 1.34 | 9 1.35
  600 50 500 | 14 2.04 | 14 2.06
  650 50 500 | 13 2.07 | 13 2.09
  700 50 500 | 10 1.79 | 10 1.82
  750 50 500 | 15 2.73 | 15 2.69
  800 50 500 | 12 2.42 | 12 2.38
  850 50 500 | 13 2.67 | 13 2.66
  900 50 500 | 27 5.18 | 27 5.25
1000 1000 9980 | 1 0.18 | 3 0.19 <=
2000 1000 10000 | 322 4.81 | 322 4.84
3000 1000 10000 | 606 15.49 | 606 15.55
4000 1000 10000 | 901 33.76 | 901 34.42
5000 1000 10000 | 1165 59.91 | 1165 62.35
6000 1000 10000 | 1498 104.36 | 1498 105.55
7000 1000 10000 | 1835 165.08 | 1835 174.82
8000 1000 10000 | 2084 234.73 | 2084 244.18
9000 1000 10000 | 2489 354.27 | 2489 372.11
10000 1000 10000 | 2763 472.47 | 2763 486.73
11000 1000 10000 | 3095 629.70 | 3095 648.31
12000 1000 10000 | 3277 773.06 | 3277 784.75
13000 1000 10000 | 3711 1003.67 | 3711 1006.94
14000 1000 10000 | 3878 1172.70 | 3878 1168.71
15000 1000 10000 | 4301 1436.40 | 4301 1429.89
16000 1000 10000 | 4485 1635.48 | 4485 1620.90
17000 1000 10000 | 4844 1924.27 | 4844 1904.92
18000 1000 10000 | 5218 2232.52 | 5218 2218.80
19000 1000 10000 | 5501 2519.60 | 5501 2508.83
20000 1000 10000 | 5770 2823.98 | 5770 2895.66
21000 1000 10000 | 6161 3183.73 | 6161 3307.54
22000 1000 10000 | 6389 3492.97 | 6389 3620.53
23000 1000 10000 | 6665 3823.38 | 6665 3995.63
24000 1000 10000 | 6982 4199.61 | 6982 4347.95
25000 1000 10000 | 7209 4546.18 | 7209 4701.95
26000 1000 10000 | 7533 4939.12 | 7533 5088.13
27000 1000 10000 | 7959 5442.25 | 7959 5599.85
28000 1000 10000 | 8140 5767.13 | 8140 5817.86
29000 1000 10000 | 8501 6250.86 | 8501 6250.30
30000 1000 10000 | 8911 6791.97 | 8911 6788.51
32000 50 500 | 487 7494.39 | 487 7493.47
32050 50 500 | 486 7541.61 | 486 7541.05
32100 50 500 | 486 7589.02 | 486 7586.12
32150 50 500 | 492 7731.18 | 492 7728.76
32200 50 500 | 497 7854.59 | 497 7854.94
32250 50 500 | 490 7795.12 | 490 7783.10
32300 50 500 | 495 7909.77 | 495 7902.70
32350 50 500 | 495 7967.86 | 495 7946.20
32400 50 500 | 496 8018.98 | 496 7999.34
32450 50 500 | 492 7982.34 | 492 7962.93
32500 50 500 | 495 8060.97 | 495 8048.18
32550 50 500 | 498 8134.02 | 498 8122.08
32600 50 500 | 498 8171.97 | 498 8169.34
32650 50 500 | 498 8213.85 | 498 8209.95
32700 50 500 | 499 8267.67 | 499 8266.13
32751 1 10 | 10 8549.30 | 10 8629.00
32752 1 10 | 10 8556.10 | 10 8636.30
32753 1 10 | 10 8562.20 | 10 8632.00
32754 1 10 | 10 8564.40 | 10 8633.40
32755 1 10 | 10 8553.90 | 10 8635.40
32756 1 10 | 10 8553.00 | 10 8637.60
32757 1 10 | 10 8552.10 | 10 8640.90
32758 1 10 | 10 8557.20 | 10 8644.90
32759 1 10 | 10 8562.00 | 10 8644.10
32760 1 10 | 10 9336.60 | 10 9436.10
32761 1 10 | 10 9337.20 | 10 9435.60
32762 1 10 | 10 9340.70 | 10 9439.10
32763 1 10 | 10 9340.70 | 10 9433.60
32764 1 10 | 10 9340.00 | 10 9440.60
32765 1 10 | 10 10112.80 | 10 10228.40
32766 1 10 | 10 10893.50 | 10 11023.50
32767 1 10 | 10 11674.40 | 10 11813.70

application/x-tgz attachment: gp.tar.gz

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Albert D. Cahalan: "Re: [PATCH] Linux-2.5 fix/improve get_pid()"
Previous message: George France: "Re: kbuild 2.5.26 - arch/alpha"
Next in thread: Andries Brouwer: "Re: Analysis for Linux-2.5 fix/improve get_pid(), comparing various approaches"
Reply: Andries Brouwer: "Re: Analysis for Linux-2.5 fix/improve get_pid(), comparing various approaches"
Reply: Linus Torvalds: "Re: Analysis for Linux-2.5 fix/improve get_pid(), comparing various approaches"
Maybe reply: Samium Gromoff: "(no subject)"
Maybe reply: peta huh: "(no subject)"
Maybe reply: peta huh: "(no subject)"
Maybe reply: Jeff Chua: "(no subject)"
Maybe reply: Bernard yap: "(no subject)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Aug 15 2002 - 22:00:19 EST