[PATCH v2] fs: sort out stale commentary about races between fd alloc and dup2()

From: Mateusz Guzik
Date: Thu Mar 20 2025 - 06:49:45 EST


Userspace may be trying to dup2() over a fd which is allocated but not
yet populated.

Commentary about it is split in 2 parts (and both warrant changes):

1. in dup2()

It claims the issue is only relevant for shared descriptor tables which
is of no concern for POSIX (but then is POSIX of concern to anyone
today?), which I presume predates standarized threading.

The comment also mentions the following systems:
- OpenBSD installing a larval file -- they moved away from it, file is
installed late and EBUSY is returned on conflict
- FreeBSD returning EBADF -- reworked to install the file early like
OpenBSD used to do
- NetBSD "deadlocks in amusing ways" -- their solution looks
Solaris-inspired (not a compliment) and I would not be particularly
surprised if it indeed deadlocked, in amusing ways or otherwise

I don't believe mentioning any of these adds anything and the statement
about the issue not being POSIX-relevant is outdated.

dup2 description in POSIX still does not mention the problem.

2. above fd_install()

<quote>
> We need to detect this and fput() the struct file we are about to
> overwrite in this case.
>
> It should never happen - if we allow dup2() do it, _really_ bad things
> will follow.
</quote>

I have difficulty parsing it. The first sentence would suggest
fd_install() tries to detect and recover from the race (it does not),
the next one claims the race needs to be dealt with (it is, by dup2()).

Given that fd_install() does not suffer the burden, this patch removes
the above and instead expands on the race in dup2() commentary.

Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
---

This contains the new commentary from:
https://lore.kernel.org/linux-fsdevel/20250320102637.1924183-1-mjguzik@xxxxxxxxx/T/#u

and obsoletes this guy hanging out in -next:
ommit ec052fae814d467d6aa7e591b4b24531b87e65ec
Author: Mateusz Guzik <mjguzik@xxxxxxxxx>
Date: Thu Dec 5 16:47:43 2024 +0100

fs: sort out a stale comment about races between fd alloc and dup2

as in it needs to be dropped.

apologies for the churn :)

I think it will be best long term if this is one commit.

fs/file.c | 52 ++++++++++++++++++++++++++++------------------------
1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index d0ecc3e59f2f..dc3f7e120e3e 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -626,22 +626,14 @@ void put_unused_fd(unsigned int fd)

EXPORT_SYMBOL(put_unused_fd);

-/*
- * Install a file pointer in the fd array.
- *
- * The VFS is full of places where we drop the files lock between
- * setting the open_fds bitmap and installing the file in the file
- * array. At any such point, we are vulnerable to a dup2() race
- * installing a file in the array before us. We need to detect this and
- * fput() the struct file we are about to overwrite in this case.
- *
- * It should never happen - if we allow dup2() do it, _really_ bad things
- * will follow.
+/**
+ * fd_install - install a file pointer in the fd array
+ * @fd: file descriptor to install the file in
+ * @file: the file to install
*
* This consumes the "file" refcount, so callers should treat it
* as if they had called fput(file).
*/
-
void fd_install(unsigned int fd, struct file *file)
{
struct files_struct *files = current->files;
@@ -1259,18 +1251,30 @@ __releases(&files->file_lock)
struct fdtable *fdt;

/*
- * We need to detect attempts to do dup2() over allocated but still
- * not finished descriptor. NB: OpenBSD avoids that at the price of
- * extra work in their equivalent of fget() - they insert struct
- * file immediately after grabbing descriptor, mark it larval if
- * more work (e.g. actual opening) is needed and make sure that
- * fget() treats larval files as absent. Potentially interesting,
- * but while extra work in fget() is trivial, locking implications
- * and amount of surgery on open()-related paths in VFS are not.
- * FreeBSD fails with -EBADF in the same situation, NetBSD "solution"
- * deadlocks in rather amusing ways, AFAICS. All of that is out of
- * scope of POSIX or SUS, since neither considers shared descriptor
- * tables and this condition does not arise without those.
+ * dup2() is expected to close the file installed in the target fd slot
+ * (if any). However, userspace hand-picking a fd may be racing against
+ * its own threads which happened to allocate it in open() et al but did
+ * not populate it yet.
+ *
+ * Broadly speaking we may be racing against the following:
+ * fd = get_unused_fd_flags(); // fd slot reserved, ->fd[fd] == NULL
+ * file = hard_work_goes_here();
+ * fd_install(fd, file); // only now ->fd[fd] == file
+ *
+ * It is an invariant that a successfully allocated fd has a NULL entry
+ * in the array until the matching fd_install().
+ *
+ * If we fit the window, we have the fd to populate, yet no target file
+ * to close. Trying to ignore it and install our new file would violate
+ * the invariant and make fd_install() overwrite our file.
+ *
+ * Things can be done(tm) to handle this. However, the issue does not
+ * concern legitimate programs and we only need to make sure the kernel
+ * does not trip over it.
+ *
+ * The simplest way out is to return an error if we find ourselves here.
+ *
+ * POSIX is silent on the issue, we return -EBUSY.
*/
fdt = files_fdtable(files);
fd = array_index_nospec(fd, fdt->max_fds);
--
2.43.0