I had problems getting blktrace going and posted a note to linux-btrace@xxxxxxxxxxxxxxx at Alan Brunelle's suggestion and he also said he'd take a closer look at it himself. On the other hand he pointed me to 'stap' and I was able to use it to get the details I was looking for - SystemTAP really rocks for this type of analysis! As it turns out my assumption about driver blocksize was wrong (sorry about that). It turns out that the size of requests from the driver to the disk is 160mb and so the I/O count was smaller than I had anticipated.Specifically, I wrote a 1GB file with a blocksize of 1MB, which would result in 1000 writes at the application level. What I believe then happens is that each write turns into 8 128KB requests to the driver, which should result in 8000 actual writes. Toss in metadata operations and who knows what else and the actual number should be a little higher. What I've see after repeating the tests a number of times on 2.6.16-27 is numbers ranging from 6800-7000 writes which feels like a big enough difference to at least point out.
Install http://brick.kernel.dk/snaps/blktrace-git-20060723022503.tar.gz
and blktrace your disk for the duration of the test and compare the io
numbers. Requires 2.6.17 or later, though.