Re: scary ext2 filesystem question

Alexander Viro (viro@math.psu.edu)
Tue, 29 Dec 1998 01:13:22 -0500 (EST)


On Sun, 27 Dec 1998, Gerard Roudier wrote:

> The order of operations that may guarantee consistency at any time
> obviously depends on the File System design and on the file operation
> that is performed. As you mentionned partially, the right order for some
> data added to a file could be something like the following for a
> UNIXish File System:
>
> 0 - Update the allocation map (bitmap of whatever reflects allocation).
Yes, indeed.
> 1 - Write the data.
> 2 - Write the indirect block (if needed)
> 3 - Write the inode.
>
> This seems pretty simple, but you just miss an order of magnitude of the
> real complexity to make things perfect, in my opinion:
>
> 1 - The ordering of operations depends on the file operation (already
> mentionned).

Sure, it does. Read the posting upthread.

> 2 - It may exist situations where any ordering may lead to inconsistency
> if a crash occurs (due to FS missdesign).

Not on ext2/ffs/ufs.

> 3 - Several files that share some meta-data blocks may be accessed
> concurrently at the same time.

You are missing the point. We shouldn't trace dependencies on
block level. Yes, there is such thing as false sharing. And there is a
regular way to deal with it. Create a record per change, keep dependencies
between the changes, for each change that depends on other ones keep old
data in the record. Now, when we are trying to flush some dirty pages to
the disk do the following: (a) prefer ones without pending dependencies
(b) when flushing a page *with* pending dependencies - create a copy,
unroll all changes with pending deps., flush said copy to disk (or just
make an original page unavailable for a while and unroll changes without
copying for the time of IO). When IO operation completes it calls a
function supplied by initiator of request. Use that to destroy records for
changes when they are known to be committed to disk (and update
dependencies, indeed). That's it.

> (3) lets me think that you also must ensure some ordering for the whole
> file system. If you want to lock the access to any dirty block of
> meta-data then you will probably not be significantly faster than
> synchronous meta-data writes.

We don't need it. Changes in core are done immediately. Changes on
disk will lag for any system. The question being: how to keep the fs on
disk consistent all the time. It doesn't require blocking access to dirty
metadata.

Think of it in terms of patch integration ;-) Suppose you have two
source trees. Suppose you are actively hacking on one of them, while
another is, erm, stable one. Suppose all your changes eventually make
their way into stable tree (dreams ;-). You have to submit your changes
in a way that stable tree would keep working, even if somebody shooted you
and rm -rf your active tree when you've submitted only a half of your
stuff. Well, this analogy sucks, indeed, but...

> (2) should apply to most existing FS, IMO. I am not sure of that since I
> am far from being knowledgeable on this topic.

;-< Could you spell 'NFS'?

> (1) I invite you to detail all possible situations for EXT2 if you have
> time for. :-)

?Parser fault.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/