Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd()
From: Aleksa Sarai
Date: Thu Apr 09 2026 - 03:49:01 EST
On 2026-04-07, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> On Thu, Apr 2, 2026 at 4:52 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
> >
> > On 2026-04-01, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> > > Trying to handle this in open() is a no-go. openat2 is rather
> > > problematic.
> >
> > I'm interested in what makes you say that. It would be very nice to be able
> > to do mkdir + RESOLVE_IN_ROOT and get an fd back all in one syscall. :D
> >
>
> Not handling this in either of open or openat2 does not preclude mkdir
> + RESOLVE_IN_ROOT + getting a fd in one go from existing.
Well, that would also require passing RESOLVE_* flags to mkdirat2(2)
which kind of begs the question why not just integrate it into
openat2(2) -- otherwise there will always be more features available to
O_CREAT than mkdirat2(2) which seems unfortunate.
> Creating a directory was always a different syscall than creating a
> file. I don't see any benefit to squeezing it into open. I do see a
> downside because of an extra branchfest to differentiate the cases.
Ah, so it's just an issue of taste, not a technical problem (as the mail
I replied to made it sound)?
> > > The routine would have to start with validating the passed O_ flags, for
> > > now only allowing O_CLOEXEC and EINVAL-ing otherwise.
> >
> > Please do not use O_* flags! O_CLOEXEC takes up 3 flag bits on different
> > architectures which makes adding new flags a nightmare.
> >
>
> With my proposal there are no new flags added so I don't think that's relevant.
I'm confused, was "the new routine would have to start with validating
the passed O_ flags" talking about a hypothetical API you oppose? It
read like a suggestion on my first pass-through, hence the reply.
If you're saying that your proposal doesn't add any new O_* (or
MKDIRAT_*) flags that really isn't the issue -- any syscall that takes a
flag argument will grow new flags eventually and using the literal value
of O_CLOEXEC for some other syscall's flags just leads to burning three
flag bits needlessly.
This is arguably the most painful thing about open_tree(2)'s flags --
most other syscalls define their own flag that is equivalent to
O_CLOEXEC but not literally equal to it (this is even recommended in
Documentation/process/adding-syscalls.rst!).
> > I think this should take AT_* flags and (like most newer syscalls)
> > O_CLOEXEC should be automatically set. Userspace can unset it with
> > fnctl(F_SETFD) in the relatively rare case where they don't want
> > O_CLOEXEC. Alternatively, we could just bite the bullet and make
> > AT_NO_CLOEXEC a thing...
> >
>
> I would say that's a pretty weird discrepancy vs what normally happens
> with other syscalls, but perhaps it would be fine.
Quite a few of the newer uAPIs do this -- all of the pidfd APIs do it,
as well as newer ioctls that return fds (like the NS_GET_* ioctls for
nsfs).
Clearing O_CLOEXEC safely is trivial but safely setting it is not really
possible in multi-threaded programs (see "man 2 openat"), so it makes
more sense for newer APIs to just default to O_CLOEXEC and userspace can
unset it (and that is what newer APIs already do).
We should probably update Documentation/process/adding-syscalls.rst to
mention this...
--
Aleksa Sarai
https://www.cyphar.com/
Attachment:
signature.asc
Description: PGP signature