Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC

From: Florian Weimer

Date: Thu Jan 15 2026 - 03:55:19 EST


* Christian Brauner:

> On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
>> In <linux/mount.h>, we have this:
>>
>> #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
>>
>> This causes a few pain points for us to on the glibc side when we mirror
>> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
>> which is one of the headers that's completely incompatible with the UAPI
>> headers.
>>
>> The reason why this is painful is because O_CLOEXEC has at least three
>> different values across architectures: 0x80000, 0x200000, 0x400000
>>
>> Even for the UAPI this isn't ideal because it effectively burns three
>> open_tree flags, unless the flags are made architecture-specific, too.
>
> I think that just got cargo-culted... A long time ago some API define as
> O_CLOEXEC and now a lot of APIs have done the same.

Yes, it looks like inotify is in the same boat.

> I'm pretty sure we can't change that now but we can document that this
> shouldn't be ifdefed and instead be a separate per-syscall bit. But I
> think that's the best we can do right now.

Maybe add something like this as a safety measure, to ensure that the
flags don't overlap?

diff --git a/fs/namespace.c b/fs/namespace.c
index c58674a20cad..5bbfd379ec44 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
bool detached = flags & OPEN_TREE_CLONE;

BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
+ BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE));
+ BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) &
+ (O_CLOEXEC | OPEN_TREE_CLONE)));

if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE |
AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE |
@@ -3100,7 +3103,7 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned

SYSCALL_DEFINE3(open_tree, int, dfd, const char __user *, filename, unsigned, flags)
{
- return FD_ADD(flags, vfs_open_tree(dfd, filename, flags));
+ return FD_ADD(flags & O_CLOEXEC, vfs_open_tree(dfd, filename, flags));
}

/*

(Completely untested.)

Passing the mix of flags to FD_ADD isn't really future-proof if FD_ADD
ever recognizes more than just O_CLOEXEC.

Thanks,
Florian