Fwd: Re: OT: svscan and the hard disk

From: Martín Marqués (martin@bugs.unl.edu.ar)
Date: Fri Nov 30 2001 - 07:33:21 EST


Any thoughts on what DJB thinks of the Linux FS?

Sorry for him, each day I convince myself of not using Qmail ever!

Saludos... :-)

---------- Forwarded Message ----------

Subject: Re: OT: svscan and the hard disk
Date: 30 Nov 2001 02:05:35 -0000
From: "D. J. Bernstein" <djb@cr.yp.to>
To: qmail@list.cr.yp.to

Pavel Kankovsky writes:
> all my attempts to find any piece of standard/specification/documentation
> saying that fs metadata are to be updated synchronously have failed so far

doc/smm/03.fsck_ffs/2.t

The guarantees provided by FFS make it reasonably easy for mail-handling
software to perform reliable disk transactions. The speed is adequate
for most sites.

The Linux filesystem designers, ignorant of the demands of critical
applications, screwed this up in two ways:

   * They broke compatibility, by failing to provide the FFS system
     calls with the FFS guarantees.

     They wanted an asynchronous rename(), for example. They should have
     added an asyncrename() system call. Instead they foolishly changed
     the semantics of rename(), breaking mail-handling programs.

     Of course, the incompatibility isn't obvious to people who don't
     realize that some programs rely on rename() being synchronous.

   * They didn't provide any way to perform reliable transactions, other
     than syncing the whole filesystem (with, e.g., a directory fsync).

     Even sync mode is worrisome. Has anyone verified that blocks are
     written to disk in the correct order? This is not rocket science,
     but it does require a certain level of care with every operation.

     Supposedly there's a faster transaction mechanism now, but I don't
     trust it. Do the people writing the filesystem code understand that
     there _is_ a correct order for block writes?

The situation since then has become even worse. Filesystem reliability
has gone down the tubes: users are regularly suffering data corruption,
even when there _isn't_ a crash. There are at least four different
filesystem transaction interfaces.

I'm reorganizing most of my create-one-file programs to use a generic
atomicwrite tool, so that all the stupid portability issues are isolated
inside one program. Meanwhile, qmail 2 uses its own internal filesystem
for the queue.

---Dan
n

-------------------------------------------------------

-- 
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-----------------------------------------------------------------
Martín Marqués                  |        mmarques@unl.edu.ar
Programador, Administrador, DBA |       Centro de Telematica
                       Universidad Nacional
                            del Litoral
-----------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Nov 30 2001 - 21:00:39 EST