Re: [some sanity for a change] possible design issues for hybrids

From: viro
Date: Thu Aug 26 2004 - 20:17:34 EST


On Thu, Aug 26, 2004 at 04:57:01PM -0700, Linus Torvalds wrote:
> Right. We obviously need to mark the dentry somehow, and only do the
> "create vfsmount" special case in this special case. If we didn't do that,
> then it wouldn't be a special case, now would it?
>
> So clearly lookup_mnt() needs to check the dentry in the failure case. The
> marking could be in any of three places
> - mark the dentry itself by just using a dentry flag ("DCACHE_AUTOVFSMNT")
> or by having a dentry operation for this.
> - mark the inode itself (same logic as dentry)
> - look up the first vfsmount (on the inode list), and look if that one is
> of the automatic type.
>
> Clearly we should not _always_ create a vfsmount, that would just break
> the existign logic.

Hmm... IOW, you are suggesting to use normal trigger for lookup_mnt() and
then have extra check in lookup_mnt() failure exit. That will probably
work (and I'd prefer to mark dentry here).

Freeing these guys will not be fun. The final reference could be given
up under very different locking conditions - it's certainly a blocking
operation (it can trigger final umount, among other things), but having
it trigger removal from mount tree(s) can get ugly. We might be able to
tweak that, but it will probably need require major surgery in several
places.

Right now we assume that vfsmount lock *and* per-namespace semaphore are
held when detaching from the tree (attaching what could be the first at
given dentry --- same + i_sem on its ->i_sem). We also assume that vfsmount is
detached before refcount reaches zero. The thing is, that final refcount
can be dropped while holding namespace semaphore. It's not fatal (we'll
need to hold onto these guys until dropping that semaphore or add an analog
of mntput() for places under namespace sem), but it can get messy.

Situation with unlink(): we need to (a) check that all vfsmounts over
given point are "hybrid" ones (locking is not a problem at that point)
and (b) after successful operation (unlink() and rename() alike, AFAICS)
we want to go and kill everything under the victim dentry. *That*
can get very ugly, since fs could be mounted in a lot of places and/or
in a lot of namespaces.

Note that we also need to do in VFS what NFS tries to do in revalidation -
if dentry is invalidated, everything under it should be detached and
dropped. So it's not something new - we do need something that would
be called outside of any superblock/namespace/vfsmount locks (both
unlink() and revalidation qualify) and would kill the vfsmounts that
become unreachable.

Locking rules for attaching might need adjustment (the logics with ->i_sem
is that we want !d_mountpoint() stay true around call of ->unlink(),
->rmdir() and ->rename(); so cloning or replacing doesn't need that
protection. Now it might become interesting - we'll need to be careful
in pivot_root(2) and possibly some other places).

I'll see what can be done (it'll definitely take careful looking at the
code), but there is a chance of things getting very ugly.

BTW, shared subtrees *are* compatible with that puppy - there's a lot
of corner cases to look at, but AFAICS they are all doable. Lifetime
rules can get ugly in some of them, but that's on par with the situation
without sharing.

One thing that looks like a bad interface: we get forcible "use same
dentry for file and directory" with that design. That can lead to
a big can of worms - without a large audit I wouldn't bet a dime on
ability to get that right (and same applies to reiser4 code as-is,
for the same reasons). No idea how bad it will turn out to be - right
now all I can say is that we might run into trouble and that we almost
certainly will get extra "be careful not to do <list of things>" rules
out of that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/