Re: [PATCH 2/2] cfq-iosched: rethink seeky detection for SSDs

From: Corrado Zoccolo
Date: Wed Mar 03 2010 - 14:47:44 EST

Hi Vivek,
On Mon, Mar 1, 2010 at 3:25 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Sat, Feb 27, 2010 at 07:45:40PM +0100, Corrado Zoccolo wrote:
>> CFQ currently applies the same logic of detecting seeky queues and
>> grouping them together for rotational disks as well as SSDs.
>> For SSDs, the time to complete a request doesn't depend on the
>> request location, but only on the size.
>> This patch therefore changes the criterion to group queues by
>> request size in case of SSDs, in order to achieve better fairness.
> Hi Corrado,
> Can you give some numbers regarding how are you measuring fairness and
> how did you decide that we achieve better fairness?
Please, see the attached fio script. It benchmarks pairs of processes
performing direct random I/O.
One is always fixed at bs=4k , while I vary the other from 8K to 64K
test00: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
test01: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
test10: (g=1): rw=randread, bs=16K-16K/16K-16K, ioengine=sync, iodepth=1
test11: (g=1): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
test20: (g=2): rw=randread, bs=32K-32K/32K-32K, ioengine=sync, iodepth=1
test21: (g=2): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
test30: (g=3): rw=randread, bs=64K-64K/64K-64K, ioengine=sync, iodepth=1
test31: (g=3): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1

With unpatched cfq (2.6.33), on a flash card (non-ncq), after running
a fio script with high number of parallel readers to make sure ncq
detection is stabilized, I get the following:
Run status group 0 (all jobs):
READ: io=21528KiB, aggrb=4406KiB/s, minb=1485KiB/s, maxb=2922KiB/s,
mint=5001msec, maxt=5003msec

Run status group 1 (all jobs):
READ: io=31524KiB, aggrb=6452KiB/s, minb=1327KiB/s, maxb=5126KiB/s,
mint=5002msec, maxt=5003msec

Run status group 2 (all jobs):
READ: io=46544KiB, aggrb=9524KiB/s, minb=1031KiB/s, maxb=8493KiB/s,
mint=5001msec, maxt=5004msec

Run status group 3 (all jobs):
READ: io=64712KiB, aggrb=13242KiB/s, minb=761KiB/s,
maxb=12486KiB/s, mint=5002msec, maxt=5004msec

As you can see from minb, the process with smallest I/O size is
penalized (the fact is that being both marked as noidle, they both end
up in the noidle tree, where they are serviced round robin, so they
get fairness in term of IOPS, but bandwidth varies a lot.

With my patches in place, I get:
Run status group 0 (all jobs):
READ: io=21544KiB, aggrb=4409KiB/s, minb=1511KiB/s, maxb=2898KiB/s,
mint=5002msec, maxt=5003msec

Run status group 1 (all jobs):
READ: io=32000KiB, aggrb=6549KiB/s, minb=1277KiB/s, maxb=5274KiB/s,
mint=5001msec, maxt=5003msec

Run status group 2 (all jobs):
READ: io=39444KiB, aggrb=8073KiB/s, minb=1576KiB/s, maxb=6498KiB/s,
mint=5002msec, maxt=5003msec

Run status group 3 (all jobs):
READ: io=49180KiB, aggrb=10059KiB/s, minb=1512KiB/s,
maxb=8548KiB/s, mint=5001msec, maxt=5006msec

The process doing smaller requests is now not penalized by the fact
that it is run concurrently with the other one, and the other still
benefits from larger requests because it uses better its time slice.

> In case of SSDs with NCQ, we will not idle on any of the queues (either
> sync or sync-noidle (seeky queues)). So w.r.t code, what behavior changes
> if we mark a queue as seeky/non-seeky on SSD?

I've not tested on NCQ SSD, but I think at worst it will not harm, and
at best, it will provide similar fairness improvements when the queue
of processes submitting requests grows above the available NCQ slots.

> IOW, looking at this patch, now any queue doing IO in smaller chunks than
> 32K on SSD will be marked as seeky. How does that change the behavior in
> terms of fairness for the queue?
Basically, we will have IOPS based fairness for small requests, and
time based fairness for larger requests.


> Thanks
> Vivek
>> Signed-off-by: Corrado Zoccolo <czoccolo@xxxxxxxxx>
>> ---
>> Âblock/cfq-iosched.c | Â Â7 ++++++-
>> Â1 files changed, 6 insertions(+), 1 deletions(-)
>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> index 806d30b..f27e535 100644
>> --- a/block/cfq-iosched.c
>> +++ b/block/cfq-iosched.c
>> @@ -47,6 +47,7 @@ static const int cfq_hist_divisor = 4;
>> Â#define CFQ_SERVICE_SHIFT Â Â Â 12
>> Â#define CFQQ_SEEK_THR Â Â Â Â Â Â Â Â(sector_t)(8 * 100)
>> +#define CFQQ_SECT_THR_NONROT (sector_t)(2 * 32)
>> Â#define CFQQ_SEEKY(cfqq) Â Â (hweight32(cfqq->seek_history) > 32/8)
>> Â#define RQ_CIC(rq) Â Â Â Â Â \
>> @@ -2958,6 +2959,7 @@ cfq_update_io_seektime(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>> Â Â Â Â Â Â Â Â Â Â Âstruct request *rq)
>> Â{
>> Â Â Â sector_t sdist = 0;
>> + Â Â sector_t n_sec = blk_rq_sectors(rq);
>> Â Â Â if (cfqq->last_request_pos) {
>> Â Â Â Â Â Â Â if (cfqq->last_request_pos < blk_rq_pos(rq))
>> Â Â Â Â Â Â Â Â Â Â Â sdist = blk_rq_pos(rq) - cfqq->last_request_pos;
>> @@ -2966,7 +2968,10 @@ cfq_update_io_seektime(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>> Â Â Â }
>> Â Â Â cfqq->seek_history <<= 1;
>> - Â Â cfqq->seek_history |= (sdist > CFQQ_SEEK_THR);
>> + Â Â if (blk_queue_nonrot(cfqd->queue))
>> + Â Â Â Â Â Â cfqq->seek_history |= (n_sec < CFQQ_SECT_THR_NONROT);
>> + Â Â else
>> + Â Â Â Â Â Â cfqq->seek_history |= (sdist > CFQQ_SEEK_THR);
>> Â}
>> Â/*
>> --

Attachment: fair.fio
Description: Binary data