Re: [RFC] atomic create+open

From: Trond Myklebust
Date: Fri Oct 07 2005 - 09:48:35 EST


fr den 07.10.2005 Klokka 15:59 (+0200) skreiv Miklos Szeredi:
> So, you are saying OPEN has to do the lookup too. That's OK, but that
> _does not_ mean that you have to do the OPEN operation from the
> ->lookup() or ->d_revalidate() methods. In fact you cannot do the
> later without getting into trouble over mounts.

You cannot do anything else without getting into trouble over dentry
races. Given that choice, I prefer to take my chances with the mounts as
those are rare.

We can add locking in order to exclude the mount races, or we can just
ignore them. AFAICS there are 2 cases:

1) umount races
refcounting of any in-use dentry is supposed to prevent trouble here.

2) mount races. Either
Just ignore if the filesystem has already opened the file.
or
Add locking to prevent races.

> > If you end up opening a different file to the one you looked up, things
> > can get very interesting.
>
> You can replace the inode in ->create_open() if you want to. Or let
> the VFS redo the lookup (as if d_revalidate() returned 0).

...but I cannot do that once I get to dentry_open(). You are ignoring
the case of generic file open without creation.

> > > I know you are thinking of the non-exclusive create case when between
> > > the lookup and the open the file is removed or transmuted on the
> > > server..
> >
> > > Yes, it's tricky to sovle, but by no means impossible without atomic
> > > lookup+open. E.g. consider this pseudo-code (only the atomic
> > > open+create case) in open_namei():
> >
> > Firstly, that pseudo-code doesn't deal at all with the race you describe
> > above. It only deals with lookup + file creation.
>
> It does deal with the race:
>
> __lookup_hash() returns a positive dentry
>
> file is removed on server
>
> "else if (!(flag & O_EXCL) && may_create(dir))" condition is met
>
> __follow_mount() return false
>
> vfs_create_open() calls ->create_open()
>
> NFS does OPEN (O_CREAT), file is opened, dentry replaced (this is not
> ellaborated in the pseudocode).

Which again only deals with the case of open(O_CREAT). My point is that
the race exists for the case of generic open().

If you first lookup the dentry in open_namei(), then end up opening a
completely different file in dentry_open(), then you are fscked.

Cheers,
Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/