Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks

From: Patrick J. LoPresti (patl@curl.com)
Date: Mon Jul 15 2002 - 20:43:34 EST


Lawrence Greenfield <leg+@andrew.cmu.edu> writes:

> Actually, it's not all that simple (you have to find the enclosing
> directories of any files you're modifying, which might require string
> manipulation)

No, you have to find the directories you are modifying. And the
application knows darn well which directories it is modifying.

Don't speculate. Show some sample code, and let's see how hard it
would be to use the "Linux way". I am betting on "not hard at all".

> or necessarily all that fast (you're doubling the number of system
> calls and now the application is imposing an ordering on the
> filesystem that didn't exist before).

No, you are not doubling the number of system calls. As I have tried
to point out repeatedly, doing this stuff reliably and portably
already requires a sequence like this:

   write data
   flush data
   write "validity" indicator (e.g., rename() or fchmod())
   flush validity indicator

On Linux, flushing a rename() means calling fsync() on the directory
instead of the file. That's it. Doing that instead of fsync'ing the
file adds at most two system calls (to open and close the directory),
and those can be amortized over many operations on that directory
(think "mail spool"). So the system call overhead is non-existent.

As for "imposing an ordering on the filesystem that didn't exist
before", that is complete nonsense. This is imposing *precisely* the
ordering required for reliable operation; no more, no less. Relying
on mount options, "chattr +S", or journaling artifacts for your
ordering is the inefficient approach; since they impose extra
ordering, they can never be faster and will usually be slower.

> It's only necessary for ext2. Modern Linux filesystems (such as ext3
> or reiserfs) don't require it.

Only because they take the performance hit of flushing the whole log
to disk on every fsync(). Combine that with "data=ordered" and see
what happens to your performance. (Perhaps "data=ordered" should be
called "fsync=sync".) I would rather get back the performance and
convince application authors to understand what they are doing.

> Finally: ext2 isn't safe even if you do call fsync() on the directory!

Wrong.

   write temp file
   fsync() temp file
   rename() temp file to actual file
   fsync() directory

No matter where this crashes, it is perfectly safe on ext2. (If not,
ext2 is badly broken.) The worst that can happen after a crash is
that the file might exist with both the old name and the new name.
But an application can detect this case on startup and clean it up.

 - Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jul 15 2002 - 22:00:32 EST