Re: OT:fault tolerant nfs HOW?

David Woodhouse (David.Woodhouse@mvhi.com)
Wed, 07 Apr 1999 14:10:10 +0100


> > Connect two servers to this internal network, and also to the outside
> > network. Mount your filesystems by NFS on these machines and re-export
> > them to your clients from there.

pavel@bug.ucw.cz said:
> This is evil. Re-exporting does not work. Not enough.

> "Normal" filehandles have slight chance to be valid on other computer,
> but fake handles you get when re-exporting nfs will not be valid for
> sure.

What if the 'internal' share was Coda, not NFS? Because of the local cache,
that'd improve performance, and would give some approximation of service
even if the main server went down. Writes to the filesystem while the central
server was down wouldn't be propagated between the two relay hosts, though,
and may cause conflicts when they were eventually written back to the main
server. I don't know how Coda handles that.

If the filesystems to be shared were read-only, you could just use software
RAID and nbd, and mount them on all the server hosts. But that doesn't work
when the filesystems are read-write.

Perhaps what we need is a filesystem that can be written concurrently by many
different hosts? Unfortunately, I don't think it's even possible to build the
necessary synchronisation primitives with our current block device hardware.

We could enforce block device access for these filesystems to be _only_ through
nbd, and not allow direct access to disks which happen to be local, or on a
shared SCSI bus. This would allow us to build in synchronisation primitives at
the nbd layer. Would that be enough?

Alternatively, we can have such a cluster always elect a single host to be the
'writer', and ensure that the filesystem is always in a coherent state (cf.
JFS). Read requests are served straight from the block device(s), and write
requests are forwarded to the 'writer'. If the writer goes down, another
election is held and a new writer is selected.

---- ---- ----
David Woodhouse David.Woodhouse@mvhi.com Office: (+44) 1223 810302
Project Leader, Process Information Systems Mobile: (+44) 976 658355
Axiom (Cambridge) Ltd., Swaffham Bulbeck, Cambridge, CB5 0NA, UK.
finger dwmw2@ferret.lmh.ox.ac.uk for PGP key.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/