Re: ext3-2.4-0.9.4

From: Lawrence Greenfield (leg+@andrew.cmu.edu)
Date: Fri Jul 27 2001 - 11:24:56 EST


Hi,

I'm one of those icky application programmers attempting to make
reliable software across different versions of Unix.

We need to get data to disk portably, quickly, and reliably.

I love it when I see things like: "No, Linus is right and the MTA
guys are just wrong."

This sort of attitude is just ridiculous. Unix had a defined set of
semantics. This might have been stupid semantics, but it had them.
Then journalling filesystems, softupdates, and Linux async updates
came along and destroyed those semantics, preventing those of us who
want to write reliable applications using the filesystem from doing
so. At least Oracle doesn't change the definition of COMMIT.

When I contacted the Linux JFS team about the semantics of link(), I
was told that there is _no way_ of forcing a link() to disk. Not an
fsync() on the file, not an fsync() on the directory, just _not
possible_.

Great.

Then we come to ext2. "Oh, just call fsync() on the directory and
you'll be fine." Well, wait, a second, if ext2 isn't ordering the
metadata writes, a crash at the wrong time (whether or not I've called
fsync()) may lose directory entries---even directory entries unrelated
to the files I'm doing operations on! Greeeeat.

Thus why all reasonably paranoid MTAs and other mail programs say "use
chattr +S on ext2"---we need ordered metadata writes.

Ok, journalled filesystems are better. At least crashes aren't going
to affect random files on disk. But since link() and the like don't
force a commit, we need some way---some reasonably portable way---of
getting that on disk. On softupdates, calling fsync() on a file
forces all directory entries pointing to that file to disk. This is
pretty reasonable. 1 fsync() call.

Why do we all cringe when we're told to call fsync() on the directory?
Several reasons:
. not needed on any other variety of Unix
. two fsync() calls force two different syncronization points: the
  application is forcing ordering on the OS that may not be needed.
  (Thus performance doesn't "fly" when you need multiple fsyncs.)
. directory may have other modifications going on that we're not
  interested in

You want to help performance? Give us an fsync() that works on
multiple file descriptors at once, or an async fsync() call. Don't
make us fight the OS on getting data to disk.

Larry

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 31 2001 - 21:00:33 EST