unix socket latency regression from 2.4 to 2.5 (and multicast AF_UNIX benchmarks)

From: Chris Friesen (cfriesen@nortelnetworks.com)
Date: Fri Mar 07 2003 - 00:48:21 EST


I've done another series of benchmarks with regards to multicast AF_UNIX
on the 2.5.63 kernel. One of the biggest surprises to me was the
performance regression in the normal userspace case and in the kernel
case with samll numbers of listeners.

The tests basically work by having a single sender send messages of
three different sizes to varying numbers of listeners, which take a
timestamp and then nanosleep() for a second to allow the other listeners
to wake up as fast as possible. When the listeners wake up they figure
out the latency and send it on to a third utility which dumps out the
latencies. In the case of the userspace test, the sender sends to each
listener in turn. In the case of the multicast test, the kernel handles
the cloning of the packet to distribute it to the listeners.

Here are the new results combined with the old for comparison. The
machine is a Duron 750, with K7 optimizations in the kernel. Both
kernels were compiled with gcc 3.2.

44bytes 2.4.20 2.5.63 2.4.20 2.5.63
# listeners userspace userspace kernelspace kernelspace
10 73,335 96,493 103,252 100,286
20 72,610 99,885 106,429 134,517
50 74,1482 95,2075 205,1301 230,1273
100 76,3000 97,4173 362,3425 431,2654
200 107,8719 737,9917 831,5412

236bytes 2.4.20 2.5.63 2.4.20 2.5.63
# listeners userspace userspace kernelspace kernelspace
10 70,346 98,510 81,265 100,290
20 74,639 100,918 122,468 137,533
50 75,1557 103,2225 230,1421 238,1329
100 80,3107 105,4415 408,3743 461,2794
200 131,9117 889,5720

40036-bytes 2.4.20 2.5.63 2.4.20 2.5.63
# listeners userspace userspace kernelspace kernelspace
10 302,4181 841,6218 322,1692 702,2231
20 303,7491 873,12606 347,3450 722,3829
50 306,10451 868,38031 483,8394 884,8583
100 309,23107 881,69403 697,17061 1137,16729
200 313,45528 898,132887 997,39810 1586,32722

It appears that sending/receiving is significantly more expensive in 2.5
than it was in 2.4, with the difference going up as the size of the
message goes up. Is 2.5 using different copying code or something?
Anyone have any ideas as to what is going on here?

Also, even with the increased copying costs, the O(1) scheduler in 2.5
means that the kernelspace multicast solution is faster than the
userspace solution in either kernel in all cases, even when waking up
200 listeners simultaneously.

Any comments on the multicast concept that haven't been discussed already?

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html



This archive was generated by hypermail 2b29 : Fri Mar 07 2003 - 22:00:01 EST