Re: elevator algorithm bug in ll_rw_blk.c

Rik van Riel (H.H.vanRiel@phys.uu.nl)
Wed, 18 Nov 1998 10:29:38 +0100 (CET)


On Tue, 17 Nov 1998, Stephen C. Tweedie wrote:
> On 17 Nov 1998 17:28:03 +0800, "Michael O'Reilly"
> <michael@metal.iinet.net.au> said:
>
> > In this case, the write performance frequently goes abysmal, even tho
> > it's sequential writing. 'Abysmal' to the tune on 800K/sec on a 6 disk
> > array.
>
> I suspect that in this sort of situation we are being hit by two
> separate known problems:
>
> 1) We only have a single request queue shared by all block devices.
> 2) Parallel sync()s interfere badly with each other.

I think we can agree that this is a very bad bug since it
hampers the performance of large servers with $$$ disk
arrays attached. The people who bought the disks probably
need the bandwidth...

> If you have lots of large writers, then as those writers compete for
> buffer cache space, they will all start synching each other's buffers to
> disk. I've been toying with the idea of just stomping on this problem
> totally by imposing a strict limit on the amount of dirty data we allow
> for any given disk (or maybe per process). If we avoid the buffer-cache
> thrashing threshold, then we can just let bdflush do its normal job of
> writeback and we have a single sync thread which is not interfered with
> by the rest of the system.
>
> The cost of course is reduced concurrency.

The cost of not doing it will be even larger. As long as we
can keep up with the I/O rate, reduced concurrency will be
better than thrashing.

Besides, the concurrency issue can be solved by handing out
the write buffers in a fairer way. As long as the data is
pushed onto the buffer cache in a fair way, the data will
go to disk with the same fairness, but we can avoid thashing
by making the "I/O timeslices" longer on the back side.

Disks work 1000+ times as slow as memory, so the 'timeslices'
on disk I/O should also be a lot longer. This case suggests
that we shift the buffer cache from a simple I/O thashing
buffer to a more advanced disk management scheme.

Rik -- slowly getting used to dvorak kbd layout...
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/