Hi Arjan,
One of the top reasons in latencytop is
Unknown reason (blk_execute_rq+0x7a/0xd) 152.3 msec 49.0 msec
Its from hddtemp, hald-addon-storage, ...
I wrote a simple systemtap script [1] to find out where its called from,
and its definitely from the ioctl (see below).
1820322406
hddtemp(3999): <- latency of 153260223 ns
Returning from: 0xffffffff80311ab0 : blk_execute_rq+0x0/0xd0 []
Returning to : 0xffffffff80315580 : sg_io+0x200/0x3c0 []
0xffffffff80315a02 : scsi_cmd_ioctl+0x2c2/0x410 []
0xffffffff80417153 : ip_queue_xmit+0x2b3/0x420 []
0xffffffff880c3be9 : sd_ioctl+0x99/0xf0 [sd_mod]
0xffffffff80313839 : blkdev_driver_ioctl+0x39/0x40 []
0xffffffff80313ab8 : blkdev_ioctl+0x278/0x710 []
0xffffffff803e540d : release_sock+0x8d/0xa0 []
0xffffffff803e54b3 : lock_sock_nested+0x93/0xa0 []
0xffffffff8029a4ff : d_alloc+0x6f/0x1a0 []
0xffffffff802f102f : security_d_instantiate+0x1f/0x30 []
0xffffffff803e540d : release_sock+0x8d/0xa0 []
0xffffffff8043c8c8 : inet_accept+0x88/0xc0 []
0xffffffff803e408e : move_addr_to_user+0x5e/0x70 []
0xffffffff802f0bb1 : security_socket_post_accept+0x11/0x20 []
0xffffffff803e472e : sys_accept+0x13e/0x1d0 []
0xffffffff802af12b : block_ioctl+0x1b/0x20 []
0xffffffff8029482b : do_ioctl+0x1b/0x70 []
0xffffffff80294a6d : vfs_ioctl+0x1ed/0x290 []
0xffffffff80296281 : sys_select+0x41/0x1b0 []
0xffffffff80294b92 : sys_ioctl+0x82/0xa0 []
0xffffffff8020c18e : system_call+0x7e/0x83 []
I suggest adding an annotation to make this reason known to latencytop
by annotating sd_ioctl/scsi_cmd_ioctl, see my patch below.
After my patch, instead of blk_execute_rq the real reason is showed:
Cause Maximum Average
SCSI device ioctl 34.2 msec 14.4 msec
I noticed there's a similar annotation in sr.c for cdrom_ioctl, but sr.c
also has annotations for scsi_nonblockable_ioctl, and scsi_ioctl.
If they are needed for sd_block_ioctl too, maybe
scsi_nonblockable_ioctl/scsi_ioctl should have the annotation themselves.
I also noticed some unsigned overflows, my top latency was sometimes a
very large number (18................................), which
was probably -1, or some other negative number. I haven't found a way
to reproduce it yet (it happens very rarely).
P.S.: LatencyTop could have a feature to ask for more details on unknown
latency reasons, and it would generate a systemtap script itself, that
would show the backtraces to help figure out whats going on.