Re: [patch,rfc] cfq: merge cooperating cfq_queues

From: Jeff Moyer
Date: Wed Oct 21 2009 - 20:09:32 EST


Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:

Hi, Corrado! Thanks for looking at the patch.

> Hi Jeff,
[...]
> I'm not sure that 3 broken userspace programs justify increasing the
> complexity of a core kernel part as the I/O scheduler.

I think it's wrong to call the userspace programs broken. They worked
fine when CFQ was quantum based, and they work well with noop and
deadline. Further, the patch I posted is fairly trivial, in my opinion.

> The original close cooperator code is not limited to those programs.
> It can actually result in a better overall scheduling on rotating
> media, since it can help with transient close relationships (and
> should probably be disabled on non-rotating ones).
> Merging queues, instead, can lead to bad results in case of false
> positives. I'm thinking for examples to two programs that are loading
> shared libraries (that are close on disk, being in the same dir) on
> startup, and end up being tied to the same queue.

The idea is not to leave cfqq's merged indefinitely. I'm putting
together a follow-on patch that will split the queues back up when they
are no longer working on the same area of the disk.

> Can't the userspace programs be fixed to use the same I/O context for
> their threads?
> qemu already has a bug report for it
> (https://bugzilla.redhat.com/show_bug.cgi?id=498242).

I submitted a patch to dump to address this. I think the SCSI target
mode driver folks also patched their code. The qemu folks are working
on a couple of different fixes to the problem. That leaves nfsd, which
I could certainly try to whip into shape, but I wonder if there are
others.

>> The next step will be to break apart the cfqq's when the I/O patterns
>> are no longer sequential. ÂThis is not very important for dump(8), but
>> for NFSd, this could make a big difference. ÂThe problem with sharing
>> the cfq_queue when the NFSd threads are no longer serving requests from
>> a single client is that instead of having 8 scheduling entities, NFSd
>> only gets one. ÂThis could considerably hurt performance when serving
>> shares to multiple clients, though I don't have a test to show this yet.
>
> I think it will hurt performance only if it is competing with other
> I/O. In that case, having 8 scheduling entities will get 8 times more
> disk share (but this can be fixed by adjusting the nfsd I/O priority).

It may be common that nfsd is the only thing accessing the device, good
point.

> For the I/O pattern, instead, sorting all requests in a single queue
> may still be preferable, since they will be at least sorted in disk
> order, instead of the random order given by which thread in the pool
> received the request.
> This is, though, an argument in favor of using CLONE_IO inside nfsd,
> since having a single queue, with proper priority, will always give a
> better overall performance.

Well, I started to work on a patch to nfsd that would share and unshare
I/O contexts based on the client with which the request was associated.
So, much like there is the shared readahead state, there would now be a
shared I/O scheduler state. However, believe it or not, it is much
simpler to do in the I/O scheduler. But maybe that's because cfq is my
hammer. ;-)

Thanks again for your review Corrado. It is much appreciated.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/