Re: Linux's future: //posix/ipc, //root and so on ?

From: Andi Kleen (ak@suse.de)
Date: Wed Mar 01 2000 - 17:12:00 EST


On Wed, Mar 01, 2000 at 03:28:17PM -0500, Alexander Viro wrote:
> >superior VFS design -- except when you need to write an NFS server.
> >Not being able to do open_by_inode() is sometimes _really_ annoying.
> >Just look at the mess in fs/nfsd/nfsfh.c and tell me how this is superior
> >over a simple iget() ?
>
> RTFSource. BSD one, that is. They _DO_ _NOT_ _USE_ _iget()_ in nfsd.
> What they do (and we ought to do) is
> /sys/nfs/nfs_subs.c:1943: error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp);
> It's _not_ an iget(). It's whatever the friggin' filesystem prefers to
> use and they also have an operation opposite to this one.

It basically does the same as iget, just that they split the downcall
to the fs and the vnode get (iget). Linux nfsfh.c is nothing at all like iget!

>
> I have a patch that moves nfsfh code (and simplifies it really big way, along
> with removing a bunch of races) into filesystems (so far only ext2). It's
> not nearly as nasty as the current variant. It will go somewhere about this
> weekend - we'll have to change VFS locking a bit before that (and solve the
> damned locking issues with target of rename() in bargain).

I have a similar patch (funny isn't it?), because it is required for reiserfs.
reiserfs needs at least 64bits to find a file again. It was cleaner to
move it out.

Unfortunately it still stinks, NFS deals very poorly with path dependent
file handles.

With the current requirement to always require a path for VFS operations
I see no way to ever build an as robust Linux based NFSv2,NFSv3 server
as BSD/Solaris have. Sad but true.

[NFSv4 has volatile file handles that require the client to store the
path and relookup when needed, but of course the
end result will be even farther away from POSIX semantics as current NFS
is already]

>
> Getting the decent replacement for iget() (i.e. one that would be fs-specific
> and would return proper dentry) is not hard. Current code tries to be too
> generic - it wants to be implemented in terms of iget() and <barf> readdir().
> Which is _really_ wrong way to go - it doesn't make sense for a lot of
> filesystems _and_ it sucks in terms of locking for the rest. If we make
> foo_lookup() aware of the situation (== aware of the fact that there may be
> a subtree trying to join the main) we can toss most of this shit away.

For traditional unix file systems like NFS it is very hard to rebuild a
path using a key (it is always very costly and usually breaks when the
directory tree changes inbetween)

>
> And forget about open_by_inode() - you will not get it. It simply doesn't make
> sense for most of filesystems. Now, an interface that would give you fhandle
> by name/opened file and opened file by fhandle actually makes sense. But st_ino
> from stat() is _not_ a suitable interface for that.

The inode is just a key. To make the key usable for stateless fs like nfs
you need to make the key a cookie that is valid over reboots. Now that "key"
is traditionally the inode number in unix file systems. In reiserfs it
is a aggregate of object id, directory id and possible offset. The only
reason to stay with a 32bit "key" for ino_t is user space compatibility,
but in kernel it should be really an arbitary length byte buffer (in
practice the key is limited by the NFS spec and it makes sense to keep
a static limit to avoid dynamic memory management)

I guess you agree with me on that ?

If we can meet at a open_by_key() function everything would be ok.

[
In 2.1 space I had a proposal (+patches) for "virtual hardlinks" that
generated dentries from inums by just creating new hidden names for
them that only existed in the dcache. Unfortunately it was rejected :/
]

For the reiserfs work I added the following extensions to VFS.
To super_operations:

        struct dentry *(*build_dentry)(struct super_block *,__u8 *key,int keylen

To inode operations:

        int (*getkey) (struct dentry *, __u8 *key, int keylen);

I guess your approach is similar ? It probably would make sense to merge
our proposals early. If yours handles the reiserfs case cleanly (64bit
keys) I would be happy to switch.

The problem the reiserfs implementation has is that it still deals very poorly
with file system tree changes (it inherited that from the nfsfh algorithm).
It is basically not a really robust solution, more a very elaborate hack :/

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Mar 07 2000 - 21:00:10 EST