Re: Patch for block write clustering

Rik van Riel (H.H.vanRiel@fys.ruu.nl)
Wed, 4 Mar 1998 13:40:42 +0100 (MET)


On Tue, 3 Mar 1998, Emil Briggs wrote:

> This is a patch to implement clustered writes in bdflush. It sorts dirty
> blocks by device and block number so that the writes are done sequentially --
> the idea being to reduce seek times for the disk heads.

I've read the patch, and it looks like it implements the sorting
in <linux/blk.h> on a higher level (where it really should belong),
but it still suffers from the same bug:

Your algoritm, quoted from blk.h:
/*
* This is used in the elevator algorithm. We don't prioritise reads
* over writes any more --- although reads are more time-critical than
* writes, by treating them equally we increase filesystem throughput.
* This turns out to give better overall performance. -- sct
*/
#define IN_ORDER(s1,s2) \
((s1)->rq_dev < (s2)->rq_dev || (((s1)->rq_dev == (s2)->rq_dev && \
(s1)->sector < (s2)->sector)))

This algoritm has the distict disadvantage of flushing
hda before hdb, and hdc (before hdd) even later.
Much better performance can be had by having knowledge
on the disk. Even using the partition number only
(rq_dev &= 0x7F or rq_dev &= 0x0F) gives better results
because then the requests to different drives will
overlap, flushing the drives more in-parrallel.
And well, since SCSI does it's own sorting, we might
just use the IDE scheme (rq_dev &= 0x7F) for the sorting
in blk.h, but the sorting in buffer.c _should_ consider
the two possible schemes, since it has much more data
available to it.

Then the algoritm would change from:
-------------------------------------
#define IN_ORDER(s1,s2) \
((s1)->rq_dev < (s2)->rq_dev || (((s1)->rq_dev == (s2)->rq_dev && \
(s1)->sector < (s2)->sector)))
-----> into <------------------------
#define IN_ORDER(s1,s2) \
(((s1)->rq_dev & 0x7f) < ((s2)->rq_dev & 0x7f) || ((((s1)->rq_dev & 0x7f) \
== ((s2)->rq_dev & 0x7f) && (s1)->sector < (s2)->sector)))

We can blatantly ignore the SCSI (and ACSI ??) partition scheme,
since they don't use the ordering of the main request queue
anyway (I'm not sure about ACSI though).

> Obviously if you have a drive that does write caching you don't
> want this patch.

Yes you do, when you have loads of partitions, loads of
I/O and use the sorting scheme I sketched above, you
will notice a bigger advantage. I can hardly wait to
see your sorting (with the 'new' scheme) in 2.1.89
bdflush...

> The patch assumes that the bdflush parameters are set to their default
> values and things might break if thats not the case, that needs to be
> fixed.

I haven't applied it yet (only read it) to 2.1.89pre-5
(it looks somewhat valid for 2.1 too), but it looks
like you ran patch in reverse...
And, yes, the sorting scheme needs some improvement, but
I guess I already said that :-)

Rik.
+-----------------------------+------------------------------+
| For Linux mm-patches, go to | "I'm busy managing memory.." |
| my homepage (via LinuxHQ). | H.H.vanRiel@fys.ruu.nl |
| ...submissions welcome... | http://www.fys.ruu.nl/~riel/ |
+-----------------------------+------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu