Kernel profile

Michael O'Reilly (michael@metal.iinet.net.au)
Tue, 30 Jul 1996 19:27:41 +0800


Profile of a machine running 2.0.8 for 3 days, acting as a terminal
server (running 240 serial ports). Machine is a P166, 64 meg ram, IDE
disk, stallion EasyConnection 8/64 serial ports, and a 3c590 ethernet
card. The machine is sucking a fair bit of system CPU. thus the
profile.

Warning: the stallion code (stli_poll) runs off the timer interrupt,
so it's unlikely to show up in the profile (or the system CPU stats!)
while still using a fair bit of CPU. Shouldn't skew the stuff below
TOO much tho.

Address Function count density percentage
[ .... ]
0010a470 system_call 67326 525.9844 0.90
0011052c add_timer 78551 1510.5962 1.05
00118694 do_wp_page 82083 131.5433 1.10
0010a4f0 ret_from_sys_call 88429 614.0903 1.18
0013bdfc ip_rt_run_bh 96913 605.7062 1.29
0012d458 dcache_add 105508 261.1584 1.41
00118904 verify_area 129086 310.3029 1.72
001a29c4 ppp_dev_xmit_lower 173584 137.3291 2.32
00122690 sync_buffers 180182 455.0051 2.41
0011b534 filemap_nopage 180753 264.2588 2.41
0011af64 generic_file_read 187024 125.6882 2.50
00118be8 do_no_page 209135 250.1615 2.79
001a0250 vortex_rx 239425 460.4327 3.20
00195fc8 stli_write 327704 345.6793 4.38
00137518 dev_transmit 458840 10428.1818 6.13
00137650 dev_tint 477690 3227.6351 6.38
0014bca0 ip_chk_addr 830318 3294.9127 11.09
001224a8 get_empty_filp 1085911 4378.6734 14.50

And guess what: get_empty_filp does a linear search. :(

[root@grunge kernel]# grep . *
domainname:(none)
file-max:4192
file-nr:1792
hostname:grunge.iinet.net.au
inode-max:8192
inode-nr:2160 1198
osrelease:2.0.8
ostype:Linux
panic:120

As far as I can see, the main caller of get_empty_filp is
sys_socket(). (at least, I can't see much else opening, and closing
files a lot. This machines routes 100 packets for every disk block
read or written).

get_empty_filp() appears to do a linear search of the 'struct file's
list, relying on a couple of simple heuristics to avoid worst cast
behaviour. My guess is that if it instead:
didn't move structs around on the list at all.
started searching from where it left of last time for an empty
struct.
it would run a fair bit faster. Given that structures never get
destroyed this would seem to be pretty easy to implement?

The ip_chk_addr usage is probably expected given that this machine
normally has about 250 network interfaces, which all appear to get
checked every time ip_chk_addr is called. Anything to speed this up
would probably be pretty useful for the people with large numbers of
virtual WWW servers as well. Maybe a hashed cache to provide fast
answers? Would probably need to suck a fair bit of RAM to get a
reasonable hit rate tho. :(

Any ideas for dramatic speed improvements? :)

Michael.