Re: Reduce latencies for syncronous writes and high I/O priority requests in deadline IO scheduler

From: Aaron Carroll
Date: Thu Apr 23 2009 - 20:11:32 EST


Hi Corrado,

Corrado Zoccolo wrote:
On Thu, Apr 23, 2009 at 1:52 PM, Aaron Carroll <aaronc@xxxxxxxxxxxxxxx> wrote:
Corrado Zoccolo wrote:
Hi,
deadline I/O scheduler currently classifies all I/O requests in only 2
classes, reads (always considered high priority) and writes (always
lower).
The attached patch, intended to reduce latencies for syncronous writes
Can be achieved by switching to sync/async rather than read/write. No
one has shown results where this makes an improvement. Let us know if
you have a good example.

Yes, this is exactly what my patch does, and the numbers for
fsync-tester are much better than baseline deadline, almost comparable
with cfq.

The patch does a bunch of other things too. I can't tell what is due to
the read/write -> sync/async change, and what is due to the rest of it.

and high I/O priority requests, introduces more levels of priorities:
* real time reads: highest priority and shortest deadline, can starve
other levels
* syncronous operations (either best effort reads or RT/BE writes),
mid priority, starvation for lower level is prevented as usual
* asyncronous operations (async writes and all IDLE class requests),
lowest priority and longest deadline

The patch also introduces some new heuristics:
* for non-rotational devices, reads (within a given priority level)
are issued in FIFO order, to improve the latency perceived by readers
This might be a good idea.
I think Jens doesn't like it very much.

Let's convince him :)

I think a nice way to do this would be to make fifo_batch=1 the default
for nonrot devices. Of course this will affect writes too...

One problem here is the definition of nonrot. E.g. if H/W RAID drivers
start setting that flag, it will kill performance. Sorting is important for arrays of rotational disks.

Can you make this a separate patch?
I have an earlier attempt, much simpler, at:
http://lkml.indiana.edu/hypermail/linux/kernel/0904.1/00667.html
Is there a good reason not to do the same for writes?
Well, in that case you could just use noop.

Noop doesn't merge as well as deadline, nor does is provide read/write
differentiation. Is there a performance/QoS argument for not doing it?

I found that this scheme outperforms noop. Random writes, in fact,
perform quite bad on most SSDs (unless you use a logging FS like
nilfs2, that transforms them into sequential writes), so having all
the deadline ioscheduler machinery to merge write requests is much
better. As I said, my patched IO scheduler outperforms noop on my
normal usage.

You still get the merging... we are only talking about the issue
order here.

* minimum batch timespan (time quantum): partners with fifo_batch to
improve throughput, by sending more consecutive requests together. A
given number of requests will not always take the same time (due to
amount of seek needed), therefore fifo_batch must be tuned for worst
cases, while in best cases, having longer batches would give a
throughput boost.
* batch start request is chosen fifo_batch/3 requests before the
expired one, to improve fairness for requests with lower start sector,
that otherwise have higher probability to miss a deadline than
mid-sector requests.
I don't like the rest of it. I use deadline because it's a simple,
no surprises, no bullshit scheduler with reasonably good performance
in all situations. Is there some reason why CFQ won't work for you?

I actually like CFQ, and use it almost everywhere, and switch to
deadline only when submitting an heavy-duty workload (having a SysRq
combination to switch I/O schedulers could sometimes be very handy).

However, on SSDs it's not optimal, so I'm developing this to overcome
those limitations.

Is this due to the stall on each batch switch?

In the meantime, I wanted to overcome also deadline limitations, i.e.
the high latencies on fsync/fdatasync.

Did you try dropping the expiry times and/or batch size?


-- Aaron


Corrado


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/