Re: [PATCH] cfq-iosched: non-rot devices do not need read queuemerging

From: Vivek Goyal
Date: Tue Jan 05 2010 - 10:14:15 EST


On Tue, Jan 05, 2010 at 09:58:52AM -0500, Jeff Moyer wrote:
> Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:
>
> > On Mon, Jan 4, 2010 at 8:04 PM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> >> Vivek Goyal <vgoyal@xxxxxxxxxx> writes:
> >>>> >>> Hi Corrado,
> >>>> >>>
> >>>> >>> What's the reason that reads don't benefit from merging queues and hence
> >>>> >>> merging requests and only writes do on SSD?
> >>>> >>
> >>>> >> On SSDs, reads are just limited by the maximum transfer rate, and
> >>>> >> larger (i.e. merged) reads will just take proportionally longer.
> >>>> >
> >>>> > This is simply not true.  You can get more bandwidth from an SSD (I just
> >>>> > checked numbers for 2 vendors' devices) by issuing larger read requests,
> >>>> > no matter whether the access pattern is sequential or random.
> >>>> I know, but the performance increase given the size is sublinear, and
> >>>> the situation here is slightly different.
> >>>> In order for the requests to be merged, they have to be submitted concurrently.
> >>>> So you have to compare 2 concurrent requests of size x with one
> >>>> request of size 2*x (with some CPU overhead).
> >>>> Moreover, you always pay the CPU overhead, even if you can't do the
> >>>> merging, and you must be very lucky to keep merging, because it means
> >>>> the two processes are working in lockstep; it is not sufficient that
> >>>> the requests are just nearby, as for rotational disks.
> >>>>
> >>>
> >>> For jeff, at least "dump" utility threads were kind of working in lockstep
> >>> for writes and he gained significantly by merging these queues together.
> >>
> >> Actually, it was for reads.
> >>
> >>> So the argument is that CPU overhead saving in this case is more substantial
> >>> as compared to gains made by lockstep read threads. I think we shall have to
> >>> have some numbers to justify that.
> >>
> >> Agreed.  Corrado, I know you don't have the hardware, so I'll give this
> >> a run through the read-test2 program and see if it regresses at all.
> > Great.
>
> I ran the test program 50 times, and here are the results:
>
> ==> vanilla <==
> Mean: 163.22728
> Population Std. Dev.: 0.55401
>
> ==> patched <==
> Mean: 162.91558
> Population Std. Dev.: 1.08612
>
> This looks acceptable to me.

Thanks Jeff, one thing comes to mind. Now with recent changes, we drive deeper
depths on SSD with NCQ and there are not many pending cfqq on service tree
until and unless number of parallel threads exceed NCQ depth (32). If
that's the case, then I think we might not be seeing lot of queue merging
happening in this test case until and unless dump utility is creating more
than 32 threads.

If time permits, it might also be interesting to run the same test with queue
depth 1 and see if SSDs without NCQ will suffer or not.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/