Re: Do we support ioprio on SSDs with NCQ (Was: Re: IO scheduler based IO controller V10)

From: Corrado Zoccolo
Date: Sun Oct 04 2009 - 05:17:03 EST


Hi Vivek,
On Sat, Oct 3, 2009 at 3:38 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Sat, Oct 03, 2009 at 02:43:14PM +0200, Corrado Zoccolo wrote:
>> On Sat, Oct 3, 2009 at 12:27 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>> > On Sat, Oct 03, 2009 at 12:14:28AM +0200, Corrado Zoccolo wrote:
>> >> In fact I think that the 'rotating' flag name is misleading.
>> >> All the checks we are doing are actually checking if the device truly
>> >> supports multiple parallel operations, and this feature is shared by
>> >> hardware raids and NCQ enabled SSDs, but not by cheap SSDs or single
>> >> NCQ-enabled SATA disk.
>> >>
>> >
>> > While we are at it, what happens to notion of priority of tasks on SSDs?
>> This is not changed by proposed patch w.r.t. current CFQ.
>
> This is a general question irrespective of current patch. Want to know
> what is our statement w.r.t ioprio and what it means for user? When do
> we support it and when do we not.
>
>> > Without idling there is not continuous time slice and there is no
>> > fairness. So ioprio is out of the window for SSDs?
>> I haven't NCQ enabled SSDs here, so I can't test it, but it seems to
>> me that the way in which queues are sorted in the rr tree may still
>> provide some sort of fairness and service differentiation for
>> priorities, in terms of number of IOs.
>
> I have a NCQ enabled SSD. Sometimes I see the difference sometimes I do
> not. I guess this happens because sometimes idling is enabled and sometmes
> not because of dyanamic nature of hw_tag.
>
My guess is that the formula that is used to handle this case is not
very stable.
The culprit code is (in cfq_service_tree_add):
} else if (!add_front) {
rb_key = cfq_slice_offset(cfqd, cfqq) + jiffies;
rb_key += cfqq->slice_resid;
cfqq->slice_resid = 0;
} else

cfq_slice_offset is defined as:

static unsigned long cfq_slice_offset(struct cfq_data *cfqd,
struct cfq_queue *cfqq)
{
/*
* just an approximation, should be ok.
*/
return (cfqd->busy_queues - 1) * (cfq_prio_slice(cfqd, 1, 0) -
cfq_prio_slice(cfqd, cfq_cfqq_sync(cfqq), cfqq->ioprio));
}

Can you try changing the latter to a simpler (we already observed that
busy_queues is unstable, and I think that here it is not needed at
all):
return -cfq_prio_slice(cfqd, cfq_cfqq_sync(cfqq), cfqq->ioprio));
and remove the 'rb_key += cfqq->slice_resid; ' from the former.

This should give a higher probability of being first on the queue to
larger slice tasks, so it will work if we don't idle, but it needs
some adjustment if we idle.

> I ran three fio reads for 10 seconds. First job is prio0, second prio4 and
> third prio7.
>
> (prio 0) read : io=978MiB, bw=100MiB/s, iops=25,023, runt= 10005msec
> (prio 4) read : io=953MiB, bw=99,950KiB/s, iops=24,401, runt= 10003msec
> (prio 7) read : io=74,228KiB, bw=7,594KiB/s, iops=1,854, runt= 10009msec
>
> Note there is almost no difference between prio 0 and prio 4 job and prio7
> job has been penalized heavily (gets less than 10% BW of prio 4 job).
>
>> Non-NCQ SSDs, instead, will still have the idle window enabled, so it
>> is not an issue for them.
>
> Agree.
>
>> >
>> > On SSDs, will it make more sense to provide fairness in terms of number or
>> > IO or size of IO and not in terms of time slices.
>> Not on all SSDs. There are still ones that have a non-negligible
>> penalty on non-sequential access pattern (hopefully the ones without
>> NCQ, but if we find otherwise, then we will have to benchmark access
>> time in I/O scheduler to select the best policy). For those, time
>> based may still be needed.
>
> Ok.
>
> So on better SSDs out there with NCQ, we probably don't support the notion of
> ioprio? Or, I am missing something.

I think we try, but the current formula is simply not good enough.

Thanks,
Corrado

>
> Thanks
> Vivek
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/