[RFC][PATCH 0/3] Skip I/O merges when disabled

From: Alan D. Brunelle
Date: Wed Apr 23 2008 - 15:09:05 EST


The block I/O + elevator + I/O scheduler code spends a lot of time
trying to merge I/Os -- rightfully so under "normal" circumstances.
However, if one were to know that the incoming I/O stream was /very/
random in nature, the cycles are wasted. (This can be the case, for
example, during OLTP-type runs.)

This patch stream adds a per-request_queue tunable that (when set)
disables merge attempts, thus freeing up a non-trivial amount of CPU cycles.

I'll be doing some more benchmarking, but this is a representative set
of data on a two-way Opteron box w/ 4 SATA drives. 'fio' was used to
generate random 4k asynchronous direct I/Os over the 128GiB of each SATA
drive. Oprofile was used to collect the results, and we collected
CPU_CLK_UNHALTED (CPU) and DATA_CACHE_MISSES (DCM) events. The data
extracted below shows both the percentage for all samples (including
non-kernel) as well as just those from the block I/O layer + elevator +
deadline I/O scheduler + SATA modules.

v2.6.25 (not patched): CPU: 5.8330% (total) 7.5644% (I/O code only)
v2.6.25 + nomerges = 0: CPU: 5.8008% (total) 7.5806% (I/O code only)
v2.6.25 + nomerges = 1: CPU: 4.5404% (total) 5.9416% (I/O code only)

v2.6.25 (not patched): DCM: 8.1967% (total) 10.5188% (I/O code only)
v2.6.25 + nomerges = 0: DCM: 7.2291% (total) 9.4087% (I/O code only)
v2.6.25 + nomerges = 1: DCM: 6.1989% (total) 8.0155% (I/O code only)

I've typically been seeing a good 20-25% reduction in CPU samples, and
10-15% in DCM samples for the random load w/ nomerges set to 1 compared
to set to 0 (looking at just the block code).

[BTW: The I/O performance doesn't change much between the 3 sets of data
- the seek + I/O times themselves dominate things to such a large
extent. There is a very small improvement seen w/ nomerges=1, but <<1%.]

It's not clear to me why 2.6.25 (not patched) requires /more/ cycles
than does the patched kernel w/ nomerges=0 -- it's been consistent in
the handful of runs I've done. I'm going to do a large set of runs for
each condition (not patched, nomerges=0 & nomerges=1) to verify that
this holds over multiple runs. I'm also going to check out sequential
loads to see what (if any) penalty the extra couple of checks incurs on
those (probably not noticeable).

The first patch in the series adds the tunable; The second adds in the
check to skip the merge code; and the third adds in the check to skip
adding requests to hash lists for merging.

Alan D. Brunelle
Hewlett-Packard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/