Re: ext3_ordered_writepage() questions

From: Jamie Lokier
Date: Fri Mar 17 2006 - 20:11:44 EST

Stephen C. Tweedie wrote:
> > That's the wrong way around for uses which check mtimes to revalidate
> > information about a file's contents.
> It's actually the right way for newly-allocated data: the blocks being
> written early are invisible until the mtime update, because the mtime
> update is an atomic part of the transaction which links the blocks into
> the inode.

Yes, I agree. It's right for that.

> > Local search engines like Beagle, and also anything where "make" is
> > involved, and "rsync" come to mind.
> Make and rsync (when writing, that is) are not usually updating in
> place, so they do in fact want the current ordered mode.

I'm referring to make and rsync _after_ a recovery, when _reading_ to
decide whether file data is up to date. The writing in that scenario
is by other programs.

Those are the times when the current ordering gives surprising
results, to the person who hasn't thought about this ordering, such as
rsync not synchronising a directory properly because it assumes
(incorrectly) a file's mtime is indicative of the last time data was
written to the file.

I agree that when writing data to the end of a new file, the data must
be committed before the metadata.

The weird distinction is really because the order ought to be, if they
can't all be atomic: commit mtime, then data, then size. But we
always commit size and mtime together.

> It's *only* for updating existing data blocks that there's any
> justification for writing mtime first. That's the question here.
> There's a significant cost in forcing the mtime to go first: it means
> that the VM cannot perform any data writeback for data written by a
> transaction until the transaction has first been committed. That's the
> last thing you want to be happening under VM pressure, as you may not in
> fact be able to close the transaction without first allocating more
> memory.

While I agree that it's not good for VM pressure, fooling programs
that rely on mtimes to decide if a file's content has changed is a
correctness issue for some applications.

I picked the example of copying a directory using rsync (or any other
program which compares mtimes) and not getting expected results as one
that's easily understood, that people actually do, and where they may
already be getting surprises that may not be noticed immediately.

Maybe the answer is to make the writeback order for in-place writes a
mount option and/or a file attribute?

-- Jamie
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at