"Stephen D. Williams" <sdw@lig.net> writes:
> An old thread, but important to get these fundamental performance
> numbers up there:
>
> 2.4.2 on an 800mhz PIII Sceptre laptop w/ 512MB ram:
>
> elapsed time for 100000 pingpongs is
> 3.81327
> 100000/3.81256
> ~26229.09541095746689888159
> 10000/.379912
> ~26321.88506812103855629724
>
> 26300 compares to 8000/sec. quite well ;-) You didn't give specs for
> your test machine unfortunately.
>
> Since this tests both 'sides' of an application communication, it
> indicates a 'null transaction' rate of twice that.
>
> This was typical cpu usage on a triple run of 10000:
> CPU states: 7.2% user, 92.7% system, 0.0% nice, 0.0% idle
I seemed to miss the original post, so I can't really comment on the
tests. However...
> Michael Lindner wrote:
> >
> > OK, 2.4.0 kernel installed, and a new set of numbers:
> >
> > test kernel ping-pongs/s. @ total CPU util w/SOL_NDELAY
> > sample (2 skts) 2.2.18 100 @ 0.1% 800 @ 1%
> > sample (1 skt) 2.2.18 8000 @ 100% 8000 @ 50%
> > real app 2.2.18 100 @ 0.1% 800 @ 1%
> >
> > sample (2 skts) 2.4.0 8000 @ 50% 8000 @ 50%
> > sample (1 skt) 2.4.0 10000 @ 50% 10000 @ 50%
> > real app 2.4.0 1200 @ 50% 1200 @ 50%
> >
> > real app Windows 2K 4000 @ 100%
> >
> > The two points that still seem strange to me are:
> >
> > 1. The 1 socket case is still 25% faster than the 2 socket case in 2.4.0
> > (in 2.2.18 the 1 socket case was 10x faster).
> >
> > 2. Linux never devotes more than 50% of the CPU (average over a long
> > run) to the two processes (25% to each process, with the rest of the
> > time idle).
> >
> > I'd really love to show that Linux is a viable platform for our SW, and
> > I think it would be doable if I could figure out how to get the other
> > 50% of my CPU involved. An "strace -rT" of the real app on 2.4.0 looks
> > like this for each ping/pong.
> >
> > 0.052371 send(7, "\0\0\0
> > \177\0\0\1\3243\0\0\0\2\4\236\216\341\0\0\v\277"..., 32, 0) = 32
> > <0.000529>
> > 0.000882 rt_sigprocmask(SIG_BLOCK, ~[], [RT_0], 8) = 0 <0.000021>
> > 0.000242 rt_sigprocmask(SIG_SETMASK, [RT_0], NULL, 8) = 0
> > <0.000021>
> > 0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6])
> > <0.000047>
> > 0.000328 read(6, "\0\0\0 ", 4) = 4 <0.000031>
> > 0.000179 read(6,
> > "\177\0\0\1\3242\0\0\0\2\4\236\216\341\0\0\7\327\177\0\0"..., 28) = 28
> > <0.000075>
The strace here shows select() with an infinite timeout, you're
numbers will be much better if you do (pseudo code)...
struct timeval zerotime;
zerotime.tv_sec = 0;
zerotime.tv_usec = 0;
if (!(ret = select( ... , &zerotime)))
ret = select( ... , NULL);
...basically you completely miss the function call for __pollwait()
inside poll_wait (include/linux/poll.h in the linux sources, with
__pollwait being in fs/select.c).
-- # James Antill -- james@and.org :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 21:00:11 EST