Re: [RFC PATCH v5 1/2] vfs: add O_CREAT|O_DIRECTORY to open*(2)

From: NeilBrown

Date: Wed Jun 03 2026 - 18:57:49 EST

On Wed, 03 Jun 2026, Jori Koolstra wrote:
> > Op 02-06-2026 17:44 CEST schreef Christian Brauner <brauner@xxxxxxxxxx>:
> >
> > Yes, I agree. This would change error codes but I don't think it
> > matters:
> >
> > * O_WRONLY | O_DIRECTORY on non-directory -> ENOTDIR
> > * O_WRONLY | O_DIRECTORY on directory -> EISDIR
> >
> > I don't think that really matters and we should be able to collapse this
> > to ENOTDIR.
>
> I will pick this up in the next version of O_CREAT|O_DIRECTORY. I think that
> makes most sense.
>
> I have an outstanding patch for changing the EACCES/EPERM to EOPNOTSUPP;
> Jeff and Jan were skeptical, but I want to know your opinion as well.
> I feel the the scenario where userspace has no fall-through but does
> handle every single -E listed in the man-page quite unlikely, so I say
> lets change them and we'll hear from them if somehow someone relied on
> this weird way of error handling.

Please cc linux-api@xxxxxxxxxxxxxxx on code and discussions that involve
API changes. I have cc:ed them on this reply.

Thanks,
NeilBrown

>
> > > >
> > > > There is another point, I maybe should have mentioned in the cover letter: I have not attempted
> > > > to handle dangling symlinks for O_MKDIR. Not because I think they are a great idea (as Aleksa
> > > > has mentioned, but I am not very familiar with the dragons it entails), but I wanted to discuss
> > > > what behavior we want in this case. Do we say that we never do a mkdir after following a lookup
> > > > last symlink? I don't think that state is even recorded right now.
> > >
> > > I think the state might be recorded in nd->depth. But you probably
> > > don't want to use that directly. Maybe forcing LOOKUP_FOLLOW to be
> > > cleared if O_CREAT|O_DIRECTORY is set would be good. But what would
> > > stop you opening an existing directory through a symlink....
> > >
> > > Probably we need a clear statement of intended semantics which we can
> > > review, agree on, then implement. Have you looked at preparing a patch
> > > for man-pages to document the change in behaviour for openat etc?
> >
> > Ugh, dangling symlinks. Actually, scratch that: Ugh, symlinks. So
> > O_CREAT without O_NOFOLLOW allows you to create the target of a dangling
> > symlink iirc. I always forget that. I think this is a very subtle bug
> > and maybe - with both eyes closed - a feature at times.
> >
> > We should straighten the behavior for O_DIRECTORY | O_CREAT and we
> > agreed on that during LSFMM. It would be nice if we could get away with
> > simply implying O_NOFOLLOW but I think you're right, Neil, that this
> > prevents a valid O_CREAT | O_DIRECTORY on an existing directory which we
> > can't do. Makes this kind of a pointless excercise.
> >
> > But this shouldn't be all that crazy to do right. Using the O_CREAT as
> > an _example_ for what we'd need:
> >
> > fs: refuse O_CREAT through a dangling symlink
> >
> > open(O_CREAT) without O_EXCL follows a trailing symlink and, when the
> > symlink target does not exist, creates it. Refuse to create through a
> > dangling symlink instead.
> >
> > In lookup_open() a negative target reached with nd->depth > 0 was
> > arrived at by following a trailing symlink; since the dentry is negative
> > the symlink is dangling. Set create_error to -ELOOP in that case.
> > Reusing the existing create_error path strips O_CREAT for both the
> > generic and ->atomic_open create paths and only reports the error when
> > the target is actually negative, so opening an existing target through a
> > symlink, interior symlinks, and O_EXCL (which never follows the trailing
> > link) are all unaffected.
> >
> > Hastily-Cobbled-Together-by: Christian Brauner (Amutable) <brauner@xxxxxxxxxx>
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index c7fac83c9a85..d20bbcc7e8d3 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -4468,6 +4468,9 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
> > dentry, mode);
> > else
> > create_error = -EROFS;
> > + /* refuse to create through a dangling (trailing) symlink */
> > + if (unlikely(nd->depth) && !create_error)
> > + create_error = -ELOOP;
> > }
> > if (create_error)
> > open_flag &= ~O_CREAT;
> >
> > It can't be that easy...
>
> This is what I suggested above, correct, in terms of behavior?
>
> In terms of the patch, I think this will work, but struct nameidata could really
> use some commentary for its fields. I spent the last two hours verifying that
> nd->depth really does what I thought it did, and I am still not 100% positive.
> AFAIS, nd->depth indeed tracks the current symlink depth, which outside of
> link_path_walk() reduces to the number of trailing links followed.
>
> But if Neil's rework of lookup_open() is merged we lose access here to nd.
> @Neil, have you thought about what would be a good way to resolve that?
>