Re: [PATCH] Re: PATCH: killing read_ahead[]

From: Jeff V. Merkey (jmerkey@timpanogas.org)
Date: Wed Oct 25 2000 - 13:56:20 EST

Next message: davej@suse.de: "Re: [PATCH] x86 setup fixes continued."
Previous message: Jens Axboe: "Re: patch: atapi dvd-ram support"
In reply to: Rik van Riel: "Re: [PATCH] Re: PATCH: killing read_ahead[]"
Next in thread: Alexander Viro: "Re: PATCH: killing read_ahead[]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Rik van Riel wrote:
>
> On Wed, 25 Oct 2000, Jeff V. Merkey wrote:
> > Rik van Riel wrote:
> > > On Wed, 25 Oct 2000, Jeff V. Merkey wrote:
> > >
> > > > I've reviewed the patch. It's affect seems minimal and will not
> > > > break NWFS as proposed -- it looks like, however, it will reduce
> > > > the performance slightly of EXT2/3 with iozone for read ahead
> > > > since the first section of the patch limits the read ahead
> > > > window size.
> > >
> > > Ummm, please read it again ;)
> > >
> > > The patch actually /increases/ the readahead size when
> > > we start to read a file from the beginning.
> >
> > But only if the file is smaller than MIN_READAHEAD * 2, which
> > would be the case for small files (which would read the whole
> > file anyway, which is how the page cache behaves today anyway).
>
> If the file is bigger than MIN_READAHEAD * 2, we will want
> to read in the file in multiple IOs anyway.
>
> If it turns out that we read that file sequentially, then
> the kernel will read in a LARGER CHUNK next time, if it
> turns out that we aren't using the file sequentially, we
> won't.
>
> The point of chosing the MIN_READAHEAD * 2 cutoff is that
> I want to avoid the N+1 problem, where we do an IO for the
> first MIN_READAHEAD pages and have to do a separate IO for
> the last 1 page...

This makes sense. One issue however, for ndb with read ahead relates to
mirroring. On NWFS,
my read-ahead window is always (cluster size + 1). This means for
sequential access, I will
almost always read 64K + 1, but I do NOT allow round robin reads from
mirrors to interleave
4K reads between devices. I read in chunks of 64K from each mirror
(since NetWare clusters are almost always contiguous runs of sectors in
each cluster unless a 4K block has been hotfixed).

The raid agent in Linux should do the same if possible, and the changes
in the page cache and it's read ahead behavior really need to be
interlocked with the Raid driver underneath, so when
it reads from mirrored devices, it does not interleave 4K -> disk 1 and
4K -> disk 2 if you know that several 4K blocks are sequential on the
disk. NWFS does this naturally since it's cluster size layouts
guarantee 64K of continuous 4K pages on the disk. This explains why on
2.4 NWFS mirroring with read ahead enabled is several orders faster than
running without mirroring.

I've seen some performance issues with Raid on Linux with mirroring
because the driver is interleaving 4K reads between devices, rather than
taking them in big chunks from each spindle.

:-)

Jeff

>
> regards,
>
> Rik
> --
> "What you're running that piece of shit Gnome?!?!"
> -- Miguel de Icaza, UKUUG 2000
>
> http://www.conectiva.com/ http://www.surriel.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: davej@suse.de: "Re: [PATCH] x86 setup fixes continued."
Previous message: Jens Axboe: "Re: patch: atapi dvd-ram support"
In reply to: Rik van Riel: "Re: [PATCH] Re: PATCH: killing read_ahead[]"
Next in thread: Alexander Viro: "Re: PATCH: killing read_ahead[]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:16 EST