Re: [RFC PATCH] fpathconf() for fsync() behavior

From: Jamie Lokier
Date: Thu Apr 23 2009 - 12:43:54 EST


Theodore Tso wrote:
> On Thu, Apr 23, 2009 at 12:21:05PM +0100, Jamie Lokier wrote:
> > Maybe it's time to do fsync properly?
>
> Application writers don't care about OS portability (it only has to
> work on Linux), or working on multiple filesystems (it only has work
> on ext3, and any filesystems which doesn't do automagic fsync's at the
> right magic times automagically is broken by design). This includes
> many GNOME and KDE developers. So as we concluded at the filesystem
> and storage workshop, we probably will have to keep automagic
> hueristics out there, for all of the broken applications. Heck, Linus
> even refused to call those applications "broken".

Sure, most apps are low quality in all respects. Many don't care about
a bit of corruption when the battery runs out. There's no pressure to
get that right, and it's quite hard to get right without good practice
to follow, and good APIs which encourage good practice naturally.

Imho, the rename-automagic-safety rule now in ext3/4 is _better_ than
requiring apps to call fsync, because it doesn't require an immediate,
synchronous disk flush and hardware cache flush. Fsync requires those
things, to be useful for databases and mail servers. If you're
renaming a lot of files, 1000s of explicit fsyncs serialises badly on
rotating media.

> So we can create a more finer-grained controlled system call ---
> although I would suggest that we just add some extra flags to
> sync_file_range() --- but it's doubtful that many application
> programmers will use it.

I proposed some flags to sync_file_range() last year, and got very
little response. Mind you there's been a lot of fsync issues coming
up since then, so maybe it stirred something :-)

sync_file_range() itself is just too weird to use. Reading the man
page many times, I still couldn't be sure what it does or is meant to
do until asking on l-k a few years ago. My guess, from reading the
man page, turned out to be wrong. The recommended way to use it for a
database-like application was quite convoluted and required the app to
apply its own set of mm-style heuristics. I never did find out if it
commits data-locating metadata and file size after extending a file or
filling a hole. It never seemed to emit I/O barriers.

Does anything at all use it? Maybe sync_file_range() can be improved
though.

I hold more hope for Nick Piggins work on fsync_range() - which at
least is comprehensible :-)

It says something that instead of writing a small wrapper around
sync_file_range() which is _supposed_ to be usable as range fsync, and
fixing sync_file_range() to behave properly, Nick found it easier to
start a separate implementation :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/