Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC

From: Christian Brauner

Date: Fri Jan 16 2026 - 05:00:13 EST


On Thu, Jan 15, 2026 at 09:55:10AM +0100, Florian Weimer wrote:
> * Christian Brauner:
>
> > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> >> In <linux/mount.h>, we have this:
> >>
> >> #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
> >>
> >> This causes a few pain points for us to on the glibc side when we mirror
> >> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> >> which is one of the headers that's completely incompatible with the UAPI
> >> headers.
> >>
> >> The reason why this is painful is because O_CLOEXEC has at least three
> >> different values across architectures: 0x80000, 0x200000, 0x400000
> >>
> >> Even for the UAPI this isn't ideal because it effectively burns three
> >> open_tree flags, unless the flags are made architecture-specific, too.
> >
> > I think that just got cargo-culted... A long time ago some API define as
> > O_CLOEXEC and now a lot of APIs have done the same.
>
> Yes, it looks like inotify is in the same boat.

It's unfortunately nost just inotify...:

include/linux/net.h:#define SOCK_CLOEXEC O_CLOEXEC
include/uapi/drm/drm.h:#define DRM_CLOEXEC O_CLOEXEC
include/uapi/linux/eventfd.h:#define EFD_CLOEXEC O_CLOEXEC
include/uapi/linux/eventpoll.h:#define EPOLL_CLOEXEC O_CLOEXEC
include/uapi/linux/inotify.h:#define IN_CLOEXEC O_CLOEXEC
include/uapi/linux/signalfd.h:#define SFD_CLOEXEC O_CLOEXEC
include/uapi/linux/timerfd.h:#define TFD_CLOEXEC O_CLOEXEC

>
> > I'm pretty sure we can't change that now but we can document that this
> > shouldn't be ifdefed and instead be a separate per-syscall bit. But I
> > think that's the best we can do right now.
>
> Maybe add something like this as a safety measure, to ensure that the
> flags don't overlap?
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index c58674a20cad..5bbfd379ec44 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
> bool detached = flags & OPEN_TREE_CLONE;
>
> BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
> + BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE));
> + BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) &
> + (O_CLOEXEC | OPEN_TREE_CLONE)));

Yeah, we can do something like that!