Re: imapd and synchronous writes

sct@dcs.ed.ac.uk
Wed, 20 Mar 96 02:20 GMT


Hi,

On Mon, 18 Mar 1996 19:25:50 -0500 (EST), John Gardiner Myers
<jgm+@cmu.edu> said:

> sct@dcs.ed.ac.uk writes:
>> Hi,

> For perspective, the problem here is that of reliability of Internet
> mail.

Yes, of course this is a critical applications. Nobody is denying
that sometimes you need cast-iron guarantees about filesystem
semantics.

>> Modern EIDE drives with write-behind can also screw the
>> O/S's sync write ordering.

> The problem is not that of ordering, but of knowing when something has
> been committed to non-volatile storage. If, in fact, the drives
> inform the OS that they have committed a write before they in fact
> have, you've got some pretty unreliable drives.

I'm afraid that's life for you. If you enable write-behind on any
EIDE drive, this is exactly what will happen. The same is true if you
use any of the common PC caching disk controllers. Sad but true.

>> And finally, NFS has NEVER made any guarantees like this

> And it has always been that people who run their mail system
> (spool/mqueue or spool/mail) over NFS Deserve To Lose.

Just as people who run sync-sensitive applications on an async
filesystem deserve to lose.

>> Not true. {int fd = open(".", O_RDONLY, 0); int rc = fsync(fd); close(fd)}

> Since the namei() call could be a performance issue, systems which
> need this should provide a feature-test for applications to key off
> of. Some have suggested "__linux", but that's not a feature test.
> It's a system test.

I know of no systems where a lookup on "." will be a significant
factor. Most systems these days, Linux included, will cache the
directory entry; and "." is always the first entry in a directory,
anyway.

As for feature-test, you've got to decide whether this is an
application or an installation issue (recall that even FreeBSD will
allow you to relax metadata guarantees these days). If you decide
it's an installation issue, then you leave a warning in the
installation instructions about making sure the filesystem is
appropriately configured. If it's an application issue, then just use
the code above --- it's the only way (short of a complete sync()) that
an application can request this service on any unix system.

>> I never said it was. If you really want that behaviour, ext2fs gives
>> you three ways to request it: by filesystem default, by per-directory
>> attribute, or explicitly on demand by the application.

> By filesystem default is not administratively practical.

Why not? First of all you were complaining that Linux didn't give you
the same guarantees as BSD. How come it is not practical to ask the
same semantics of Linux as BSD provides, given that Linux does offer
that as an option? That is the same as claiming it is not practical
to ask FreeBSD users to use sync rather than async metadata updates
(with the proviso that the default on Linux is usually to do async,
whereas FreeBSD gives sync by default).

> By the application is best, since the application developer tends to
> know better than the sysadmin or the distribtion vendor when such is
> necessary. But there needs to be a feature test.

Try fsync on ".". That's a feature test which will compile on all
Unixen, and which will fail with EINVAL if it doesn't achieve
anything.

>> No, but they can easily "chattr -R +S /var/spool/mail". If you mount
>> a ffs partition on /var/spool with delayed writes enabled, you have
>> exactly the same problem. That comes down to a broken installation.

> So, how many Linux distributions are *not* broken this way? Can I get
> interest in the Linux community to fix this?

Definitely. I have never encountered any resistance amongst Linuxers
to accepting necessary bug fixes like this.

>> The real deficiency is the lack of any defined semantics in
>> Unix/POSIX, and the lack of any standard way for an application to
>> request a certain level of service with regards to directories.

> To the extent that POSIX grossly under-specifies Unix, that is a
> problem with POSIX.

It is NOT a POSIX problem, it's a Unix problem. Unix simply makes no
guarantees about physical directory update semantics, and provides no
standard way to request such an update. If fsync() works, it's not
necessarily portable (but I'd be interested in knowing whether
spec1170 requires fsync to work over directories). Unix is under-
specifying reasonable application demands. :(

> The NT POSIX box is a prime example of a POSIX conforming system
> that is useless for writing applications.

Heh, you won't find me arguing with that. :)

> To the extent that the Linux community uses lack of specification in
> POSIX as an excuse to fail to provide necessary functionality, that is
> a problem with the Linux community.

Linux DOES provide the necessary functionality, both at the
installation and the application level, and it does the latter in the
only semi-standard and reasonable way, using fsync(). I completely
fail to understand your last comment.