Re: [patch]cfq-iosched: delete deep seeky queue idle logic

From: Shaohua Li
Date: Wed Sep 21 2011 - 07:16:25 EST


On Sat, 2011-09-17 at 03:25 +0800, Corrado Zoccolo wrote:
> On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@xxxxxxxxx> wrote:
> > On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
> >> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li <shaohua.li@xxxxxxxxx> wrote:
> >> > Recently Maxim and I discussed why his aiostress workload performs poorly. If
> >> > you didn't follow the discussion, here are the issues we found:
> >> > 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
> >> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
> >> > as seeky, but since when accessing A+1, A+1 is already in disk cache, this
> >> > should be detected as sequential really. Not sure if any real workload has such
> >> > access patern, and seems not easy to have a clean fix too. Any idea for this?
> >>
> >> Not all disks will cache 4 independent streams, we can't make that
> >> assumption in cfq.
> > sure thing. we can't make such assumption. I'm thinking if we should
> > move the seeky detection in request finish. If time between two requests
> > finish is short, we thought the queue is sequential. This will make the
> > detection adaptive. but seems time measurement isn't easy.
> >
> >> The current behaviour of assuming it as seeky should work well enough,
> >> in fact it will be put in the seeky tree, and it can enjoy the seeky
> >> tree quantum of time. If the second round takes a short time, it will
> >> be able to schedule a third round again after the idle time.
> >> If there are other seeky processes competing for the tree, the cache
> >> can be cleared by the time it gets back to your 4 streams process, so
> >> it will behave exactly as a seeky process from cfq point of view.
> >> If the various accesses were submitted in parallel, the deep seeky
> >> queue logic should kick in and make sure the process gets a sequential
> >> quantum, rather than sharing it with other seeky processes, so
> >> depending on your disk, it could perform better.
> > yes, the idle logic makes it ok, but sounds like "make things wrong
> > first (in seeky detection) and then fix it later (the idle logic)".
> >
> >> > 2. deep seeky queue idle. This makes raid performs poorly. I would think we
> >> > revert the logic. Deep queue is more popular with high end hardware. In such
> >> > hardware, we'd better not do idle.
> >> > Note, currently we set a queue's slice after the first request is finished.
> >> > This means the drive already idles a little time. If the queue is truely deep,
> >> > new requests should already come in, so idle isn't required.
> > What did you think about this? Assume seeky request takes long time, so
> > the queue is already idling for a little time.
> I don't think I understand. If cfq doesn't idle, it will dispatch an
> other request from the same or an other queue (if present)
> immediately, until all possible in-flight requests are sent. Now, you
> depend on NCQ for the order requests are handled, so you cannot
> guarantee fairness any more.
>
> >
> >> > Looks Vivek used to post a patch to rever it, but it gets ignored.
> >> > http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> >> I get a 404 here. I think you are seeing only one half of the medal.
> >> That logic is there mainly to ensure fairness between deep seeky
> >> processes and normal seeky processes that want low latency.
> > didn't understand it. The logic doesn't protect non-deep process. how
> > could it make the normal seeky process have low latency? or did you have
> > a test case for this, so I can analyze?
> > I tried a workload with one task drives depth 4 and one task drives
> > depth 16. Appears the behavior isn't changed w/wo the logic.
sorry for the delay.

> Try a workload with one shallow seeky queue and one deep (16) one, on
> a single spindle NCQ disk.
> I think the behaviour when I submitted my patch was that both were
> getting 100ms slice (if this is not happening, probably some
> subsequent patch broke it).
> If you remove idling, they will get disk time roughly in proportion
> 16:1, i.e. pretty unfair.
I thought you are talking about a workload with one thread depth 4, and
the other thread depth 16. I did some tests here. In an old kernel,
without the deep seeky idle logic, the threads have disk time in
proportion 1:5. With it, they get almost equal disk time. SO this
reaches your goal. In a latest kernel, w/wo the logic, there is no big
difference (the 16 depth thread get about 5x more disk time). With the
logic, the depth 4 thread gets equal disk time in first several slices.
But after an idle expiration(mostly because current block plug hold
requests in task list and didn't add them to elevator), the queue never
gets detected as deep, because the queue dispatch request one by one. So
the logic is already broken for some time (maybe since block plug is
added).

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/