Re: ext3_ordered_writepage() questions

From: Theodore Ts'o
Date: Sun Mar 19 2006 - 21:15:34 EST


On Sun, Mar 19, 2006 at 02:36:10AM +0000, Jamie Lokier wrote:
> It's other programs (OpenOffice, etc.) which are being used just
> before the power cut. If the programs which run just before the power
> cut do the above (writing then renaming), then they're fine, but many
> programs don't do that.
>
> Now, to be fair, most programs don't overwrite data blocks in place either.
>
> They usually open files with O_TRUNC to write with new contents. How
> does that work out with/without Badari's patch? Is that safe in the
> same way as creating new files and appending to them is?

The competently written ones don't open files with O_TRUNC; they will
create a temp. file, write to the temp. file, and then rename file
once it is fully written to the original file, just like rsync does.

This is important, given that many developers (especially kernel
developers) like to use hard link farms to minimize space, and if you
just O_TRUNC the existing file, then the change will be visible in all
of the directories. If instead the editor (or openoffice, etc.)
writes to a temp file and then renames, then it breaks the hard link
with COW semantics, which is what you want --- and in practice,
everyone using (or was using) bk, git, and mercurial use hard-linked
directories and it works just fine.

But yes, using O_TRUNC works just fine with and without Badari's
patch, because allocating new data blocks to a old file that is
truncated is exactly the same as appending new data blocks to a new
file.

> Or does O_TRUNC mean that the old data blocks might be released and
> assigned to other files, before this file's metadata is committed,
> opening a security hole where reading this file after a restart might
> read blocks belonging to another, unrelated file?

No, not in journal=ordered mode, since the data blocks are forced to
disk before the metadata is committed. Opening with O_TRUNC is
metadata operation, and allocating new blocks after O_TRUNC is also a
metadata operation, and in data=journaled mode, blocks are written out
before the metadata is forced out.

> It's this: you edit a source file with your favourite editor, and save
> it. 3 seconds later, there's a power cut. The next day, power comes
> back and you've forgotten that you edited this file.

Again, *my* favorite editor saves the file to a newly created file,
#filename, and once it is done writing the new file, renames filename
to filename~, and finally renames #filename to filename. This means
that we don't have to worry about your power cut scenario, and it also
means that hard-link farms have the proper COW semantics.

> However, all of the above examples really depend on what happens with
> O_TRUNC, because in practice all editors etc. that are likely to be
> used, use that if they don't do write-then-rename.

O_TRUNC is a bad idea, because it means if the editor crashes, or
computer crashes, or the fileserver crashes, the original file is
*G*O*N*E*. So only silly/stupidly written editors use O_TRUNC; if
yours does, abandon it in favor of another editor, or one day you'll
be really sorry. It's much, much, safer to do write-then-rename

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/