On Wed, Mar 10, 2010 at 10:31:46PM +0800, Christian Ehrhardt wrote:
Wu Fengguang wrote:
[...]
Christian, did you notice this commit for 2.6.33?[...]
commit 65a80b4c61f5b5f6eb0f5669c8fb120893bfb388
I didn't see that particular one, due to the fact that whatever the result is it needs to work .32
Anyway I'll test it tomorrow and if that already accepted one fixes my issue as well I'll recommend distros older than 2.6.33 picking that one up in their on top patches.
OK, thanks!
It should at least improve performance between .32 and .33, becauseAs you saw from my blktrace thats already the case without that patch.
once two readahead requests are merged into one single IO request,
the PageUptodate() will be true at next readahead, and hence
blk_run_backing_dev() get called to break out of the suboptimal
situation.
Once the second readahead comes in and merged it gets unplugged in 2.6.32 too - but still that is bad behavior as it denies my things like 68% throughput improvement :-).
I mean, when readahead windows A and B are submitted in one IO --
let's call it AB -- commit 65a80b4c61 will explicitly unplug on doing
readahead C. While in your trace, the unplug appears on AB.
The 68% improvement is very impressive. Wondering if commit 65a80b4c61
(the _conditional_ unplug) can achieve the same level of improvement :)
Your patch does reduce the possible readahead submit latency to 0.yeah and I think/hope that is fine, because as I stated:
- low utilized disk -> not an issue
- high utilized disk -> unplug is an noop
At least personally I consider a case where merging of a readahead window with anything except its own sibling very rare - and therefore fair to unplug after and RA is submitted.
They are reasonable assumptions. However I'm not sure if this
unconditional unplug will defeat CFQ's anticipatory logic -- if there
are any. You know commit 65a80b4c61 is more about a *defensive*
protection against the rare case that two readahead windows get
merged.
Is your workload a simple dd on a single disk? If so, it sounds likeIt might still be illogical hidden as e.g. 2.6.27 unplugged after the first readahead as well :-)
something illogical hidden in the block layer.
But no my load is iozone running with different numbers of processes with one disk per process.
That neatly resembles e.g. nightly backup jobs which tend to take longer and longer in all time increasing customer scenarios. Such an improvement might banish the backups back to the night were they belong :-)
Exactly one process per disk? Are they doing sequential reads or more
complicated access patterns?
Thanks,
Fengguang