Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner
Date: Fri May 15 2026 - 09:58:56 EST
On Fri, May 15, 2026 at 12:55:23PM +0200, Jori Koolstra wrote:
> Sorry for the double email, this keyboard is so finicky, I really need to fix it.
>
> > Op 11-05-2026 14:00 CEST schreef Christian Brauner <brauner@xxxxxxxxxx>:
> >
> > mkdirat2() is objectively the worse api. It forces userspace to use a
> > separate system call without any reason whatsoever. If you can to
> > O_CREAT you should also be able to to O_DIRECTORY in the same system
> > call. If we support O_DIRECTORY | O_CREAT we get all the lookup
> > restriction niceties RESOLVE_* for free. Plus, it is supportable both in
> > openat() and openat2() because I made that combo return an errno.
>
> I don't disagree. I know that some of the UAPI feature requests are not fully
> flashed out, but at least it gives a basis to get the discussion going.
>
> In fact I already have a O_DIRECTORY | O_CREAT patch that at least passes
> the initial tests. However, I need to sit on it a little bit to think whether
> I am not leaving something out. Also, I understand why vfs_create() wasn't used
> in the O_CREAT path, for instance because you cannot just make use of may_create_dentry()
> there. But now that we are going to string another path through lookup_open() it
> would be great if we could reuse some of the logic from vfs_create() and vfs_mkdir().
>
> Perhaps we could move may_create_dentry() out of the vfs_* calls and let the caller
> take care of that. Then again, this is the pattern for all those calls. You could also
> just accept some redundancies with may_o_create(), or have something like
> static vfs_mkdir/create_common() functions.
>
> There are also some minor things. If i_op->mkdir is missing this is an EPERM, but with
> i_op->create it is EACCESS (and suggesting ENOSYS). Should this not be a consistent error
ENOSYS is worse because it would indicate to userspace that the whole
system call they're using isn't available. So we can't ever use that.
This is probably just historical baggage. EACCES is especially wrong
because userspace would expect that particular error code to stem from
an LSM.
In both cases the correct error code would be EOPNOTSUPP and we should
aim to be very consistent in such cases and only reserve this for
missing support for a specific functionality.
I would be very surprised if userspace depended on EACCES being returned
though. So I would say let's try and see whether we can correct this and
return EOPNOTSUPP. The amount of filesystems that have no ->create
should be dwarved by those that do.
> code? I also wonder whether there is a nicer way to handle error being returned from
> vfs_mkdir et al. If I am reading
>
> if (!error) {
> dentry = vfs_mkdir(mnt_idmap(path.mnt), path.dentry->d_inode,
> dentry, mode, &delegated_inode);
> if (IS_ERR(dentry))
> error = PTR_ERR(dentry);
> }
> end_creating_path(&path, dentry);
>
> it feels like there is a missing return inside the if (IS_ERR(dentry)) block, and I
> have to go several function deep to see that end_creating_path correctly deals with
> error values being passed instead of a dentry. Then again, probably not worth the
> churn...
Not without it being a major clarity win at least.
> > UAPI design often is a nasty mix of performance (context switches),
> > separation of concerns and privileges, tastefulness, and compromises you
> > never thought or wanted to make.
> >
>
> Yes, thanks for suggestion this back at FOSDEM. It is quite fun, and lots
> to learn :)
Thanks for working on this. It's helpful.