Re: safe file systems

Ingo Molnar (mingo@pc7537.hil.siemens.at)
Thu, 25 Sep 1997 13:10:50 +0200 (MET DST)


On Wed, 24 Sep 1997, Larry McVoy wrote:

> : [write re-ording]
> :
> : I think even my cheapish SCSI disks (which have a 1 (maybe 2?) MB cache)
> : will do this. I assume here a system reset will not affect them, but
> : power-failure did last time I checked. (But you can twiddle with the tables
> : on them and modify the way it writes data back, etc).
>
> The default on every SCSI & FC disk I've ever seen (and I've seen a
> fair number including drives from HP, Seagate, Quantum, IBM, Maxstor,
> and probably others I'm forgetting) is to /not/ do write caching. If
> the drive says the write is done, it is done.

yep, and database servers do rely on this. Most RDBMSs have an additional
layer of protection, soft-checksumming, which detects half-written sectors
... but this is not against power failure, it's protection against media
failure. Plus most current disks have built-in ECC which detects (on
cheaper disks hides & redirects ...) media failure.

if the only failure source is power interruption (or any other system
interruption which doesnt damage the disk itself), most disks guarantee
that they write sectors atomically [this behaviour is not specified, but
present ;)]. They write out at least the last sector when they go down,
and they autopark the head.

Thus soft updates provide guaranteed filesystem metadata structure, no
matter where the interruption happens. _With_ full usage of important SCSI
features like tagged queueing and scatter-gather, disconnection. So a
soft-update filesystem can be just as fast IO-wise, as a 'normal'
filesystem. The difference is slightly higher kernel metadata management
costs, but the fast path can be made almost as fast as for unprotected
ext2fs.

so it's not 'slow safe filesystem', but rather 'clever safe filesystem'.
This approach is much more modern than JFS/LFS, eg. the physical layout of
the filesystem is completely identical with the 'unprotected' filesystem,
and thus IO speed isnt affected by 'safety management' costs.

-- mingo