Re: ext3_ordered_writepage() questions
From: Jamie Lokier
Date: Fri Mar 17 2006 - 11:11:36 EST
Theodore Ts'o wrote:
> > >Yup. Ordered-mode JBD commit needs to write and wait upon all dirty
> > >file-data buffers prior to journalling the metadata. If we didn't run
> > >journal_dirty_data_fn() against those buffers then they'd still be under
> > >I/O after commit had completed.
> > >
> > In non-block allocation case, what metadata are we journaling in
> > writepage() ? block allocation happend in prepare_write() and
> > commit_write() journaled the transaction. All the meta data
> > updates should be done there. What JBD commit are you refering to
> > here ?
> Basically, this boils down to what is our definition of ordered-mode?
> If the goal is to make sure we avoid the security exposure of
> allocating a block and then crashing before we write the data block,
> potentially exposing previously written data that might be belong to
> another user, then what Badari is suggesting would avoid this
> particular problem.
> However, if what we are doing is overwriting our own data with more an
> updated, more recent version of the data block, do we guarantee that
> any ordering semantics apply? For example, what if we write a data
> block, and then follow it up with some kind of metadata update (say we
> touch atime, or add an extended attribute). Do we guarantee that if
> the metadata update is committed, that the data block will have made
> it to disk as well? Today that is the way things work, but is that
> guarantee part of the contract of ordered-mode?
That's the wrong way around for uses which check mtimes to revalidate
information about a file's contents.
Local search engines like Beagle, and also anything where "make" is
involved, and "rsync" come to mind.
Then the mtime update should committed _before_ the data begins to be
written, not after, so that after a crash, programs will know those
files may not contain the expected data.
A notable example is "rsync". After a power cycle, you may want to
synchronise some files from another machine.
Ideally, running "rsync" to copy from the other machine would do the trick.
But if data is committed to files on the power cycled machine, and
mtime updates for those writes did not get committed, when "rsync" is
later run it will assume those files are already correct and not
update them. The result is that the data is not copied properly in
the way the user expects.
With "rsync" this problem can be avoided using the --checksum option,
but that's not something a person is likely to think of needing when
they assume ext3 provides reasonable consistency guarantees, so that
it's safe to pull the plug.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/