IDEA: multiple dirty lists in buffer.c

Gerard Roudier (groudier@club-internet.fr)
Mon, 29 Mar 1999 22:49:48 +0200 (MET DST)


Hello,

In the pure tradition of UNIX systems, Linux usually makes the current
process wait on resources that lack.

Basically, when some IO control block pool is empty, the process that
wants to do some IO just waits if some resource needed for the IO is
lacking.

This is quite fine for user processes that want the IO to complete, but
when the process is 'bdflush' or friends, this feature may become some
bottleneck that affects IO write thoughtput if we are not careful.

Basically, the buffer cache seems to use a single DIRTY lru list that it
scans at flushing. This probably works fine when there is not a large
sequence of IOs to queue to the same device at a time. But if (when) that
happens, bdflush may just be feeding a single device and wait for
resources private to that device (SCSI disk command pool for example) when
other IOs (writes) could have been queued to other devices.

The idea, that perhaps is stupid if I did miss something important, is to
use several DIRTY lists and hash the dirty buffers among this lists using
some simple hashing that involves kdev_t so that 2 buffers from 2
different lists will go to 2 different physical devices.

By flushing part of each dirty list at a time (minimum 1 buffer), and
circularily scanning these dirty lists, bdflush will have far more chance
to feed several devices, before having to wait, than using a single dirty
list.

Basically, for SCSI disks, using for example 8 dirty lru lists and the
following hash: ((dev >> 4) & 7) to index the dirty lru list for a device,
will help bdflush to send writes to 8 differents disks before some lack of
SCSI command for a given device in the SCSI IO-subsystem will let it wait
for that resource.

I donnot consider this idea to be great at all. But basically, it seems to
me easy to implement and may significantly improve write throughput on
system that use several hard disks, at a very reasonnable effort and with
a minimal risk of breaking the code.

Sorry, if I missed something that already addresses this issue in the
current code, or if this has been already either suggested ot implemented.

Regards,
Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/