Re: fsync on large files

Linus Torvalds (torvalds@transmeta.com)
Wed, 17 Feb 1999 23:40:38 -0800 (PST)


On Thu, 18 Feb 1999, Alexander Viro wrote:
> ^^^^^^^^^^^^^^
> mkdir a
> mkdir b
> mkdir a/c
> ln a/c b/c
> mkdir a/c/d
> mv b a/c/d

Ayee. Good spotting. Nasty. I was wrong, it's not all that easy at all.

I would have just walked it at ln time, and wouldn't ever have noticed
that case.

It's still a consistent filesystem, but "find" would certainly get
conniptions on seeing the above ;)

> > We can't do them right as is, but getting rid of ".." in the on-disk
> > directory structure would be one step, and I think I can handle the dentry
> > aliasing issue too.
>
> Could you elaborate? I am trying to figure out the way to do that
> and for the case of multiple links *from the same directory* I have a
> kinda-sorta solution. For generic thing... I would really like to hear
> your variant.

It's basically the same as for hardlinked files: index by <inode:filename>
rather than by <dentry:filename>, and move the child list into the inode.
We already have the inode, and we already disallow having children of
negative dentries, so we have all the rules in place.

You also have to do the dentry locking in the inode - but guess what? We
already do so anyway by re-using inode->i_sem for that (which is
conceptually wrong the way it is laid out right now, but it's the right
thing once you use the inode for child management).

This was why inodes were done in the first place - you can do anything in
computer science by adding a layer of indirection (djikstra?). If you
don't want to allow aliases, you'd design your filesystem with the inode
inside the directory structure itself, instead of having the extra level
of indirection.

So we have all the support stuff for it already as far as I can tell.

> > Imagine, for example, a directory tree with a shared component. Wouldn't
> > it be nice to just link it into the tree at multiple points? Imagine a
> > chroot() environment, for a moment - symlinks don't work to the outside,
> > but hardlinking does.
>
> nullfs. It was invented for such things and we can do it. With
> *very* small overhead - dcache helps big way here.

We can't do it without handling the dcache alias issue. Otherwise:

mkdir a
mkdir b
mount -t nullfs a b
touch a/c
rm b/c

and now you have a really confused dcache as it is now, because you still
have the a/c dentry chain live even though the file was removed through
the b/c dentry.

Doing the index by inode, a/c goes away when you rm b/c, and everything is
nice.

So yes, _then_ we can do a really nice low-overhead loopback filesystem,
and it's going to kick butt because it really is very low overhead indeed
(essentially no overhead at all, because the dentry cache handles
everything for us quite automatically).

I'd still like to allow hard links too, but my mind isn't quite as twisted
as yours is, judging by your nasty example ;)

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/