Test done with XFS on both the target and the initiator. This confirms
your findings, using files instead of block devices is faster, but
only when using the io_context patch.
It shows that the one really matters is the io_context patch,
even when context readahead is running. I guess what happened
in the tests are:
- without readahead (or readahead algorithm failed to do proper
sequential readaheads), the SCST processes will be submitting
small but close to each other IOs. CFQ relies on the io_context
patch to prevent unnecessary idling.
- with proper readahead, the SCST processes will also be submitting
close readahead IOs. For example, one file's 100-102MB pages is
readahead by process A, while its 102-104MB pages may be
readahead by process B. In this case CFQ will also idle waiting
for process A to submit the next IO, but in fact that IO is being
submitted by process B. So the io_context patch is still necessary
even when context readahead is working fine. I guess context
readahead do have the added value of possibly enlarging the IO size
(however this benchmark seems to not very sensitive to IO size).
Thanks,
Fengguang