Re: Linux Buffer Cache Does Not Support Mirroring

Jeff V. Merkey (jmerkey@timpanogas.com)
Fri, 29 Oct 1999 14:30:47 -0600


Linus,

There are two other issues I did not mention. Hotfixing won't work with
your buffer cache because we have no way of getting write errors
real-time. We can get read errors, but write errors are not
synchronous. NWFS is based on an async model between the file system
and media manager (hotfix/mirroring agent) and the disks. There is no
callback mechanism for read/write errors. Having one makes
hotfix/mirroring extremely easy.

Jeff

"Jeff V. Merkey" wrote:
>
> Linus,
>
> The Netware file system for linux is running mirroring now (up to 8
> mirrors per logical partition), but unfortunately, we had to write our
> own buffer cache and integrate it in a modular fashion with all the
> Linux versions -- not as simple an undertaking as one might think given
> all the versions of Linux. We are still calling getblk() underneath so
> both buffer caches are in sync, but are now looking at completely
> by-passing your cache and going to disk with make_request(). We are
> close on balancing memory consumption between the caches, but sure could
> use some help.
>
> All of this work delayed us by about 4 months with the project and was
> something we did not anticipate. The problem is that the buffer cache
> is based on physical disk LBA offsets rather than supporting a tiered
> semantic for logical disk objects. Ideally, if the buffer cache
> supported a logical object manager (partition based) that would
> translate logical partition blocks to LBA's, with a configurable "list"
> of devices to write to, we would be there.
>
> Here's the problem with what's there in a nutshell. If we use your
> buffer cache to perform the mirroring, we end up double, triple, and
> quadruple caching data. Since you store blocks based on LBA, each time
> we write a mirror, the data gets buffered once for each device.
> Ideally, since we are dealing with "mirrors" we only need one cached
> copy of the disk block and the ability to force it to write to several
> physical devices at once. There are also two other cases that are
> biting us in the ass. We load balance reads by striping them across the
> mirrors and reading them round robin -- again we get double caching for
> each mirror present.
>
> There is also another case where we perform an artificial hotfix to
> mirrors on read HOTFIX errors (if the sectors were not recovered on the
> primary, and could not be located on a valid mirror, then mirrors must
> be artificially HOTFIXED so users will get a disk read error on any
> files that could not be recovered.
>
> This will require a semantic similiar to what's in NT's VM Cache today
> (they already ran across this with their FTDISK.SYS stuff, and learned
> that this semantic is needed -- a cache object handle that is relative
> to each partition on a device that the buffer cache maps underneath to
> LBA). For now, we've got our own buffer cache that is customized for
> what we need, and I don't expect you to backrev older Linux versions,
> but we do need this moving forward, otherwise, Linux will be limited in
> terms of fault tolerance available.
>
> We've got all the tools for creating Netware volumes, mirrors, disk
> repair, etc. finished and will be putting them up soon. Steve Spicer is
> enabling NWFS for SPARC and ALPHA machines. We want to get EVERYTHING
> done at once and put up at one time.
>
> Please advise on these cache issues.
>
> Jeff Merkey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/