Re: [PATCH] fs: consistently deref the files table with rcu_access_pointer()

From: Mateusz Guzik
Date: Thu Mar 13 2025 - 09:43:44 EST


On Thu, Mar 13, 2025 at 1:32 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> ... except when the table is known to be only used by one thread.
>
> A file pointer can get installed at any moment despite the ->file_lock
> being held since the following:
> 8a81252b774b53e6 ("fs/file.c: don't acquire files->file_lock in fd_install()")
>
> Accesses subject to such a race can in principle suffer load tearing.
>
> While here redo the comment in dup_fd() as it only covered a race against
> files showing up, still assuming fd_install() takes the lock.
>
> Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
> ---
>
> I confirmed the possiblity of the problem with this:
> https://lwn.net/Articles/793253/#Load%20Tearing
>
> Granted, the article being 6 years old might mean some magic was added
> by now to prevent this particular problem.
>
> While technically this classifies as a bugfix, given that nothing blew
> up and this is more of a "just in case" change, I don't think this
> warrants any backports. Thus I'm not adding a Fixes: tag to prevent this
> from being picked by autosel.
>
> fs/file.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index 6c159ede55f1..52010ecb27b8 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -423,17 +423,25 @@ struct files_struct *dup_fd(struct files_struct *oldf, struct fd_range *punch_ho
> old_fds = old_fdt->fd;
> new_fds = new_fdt->fd;
>
> + /*
> + * We may be racing against fd allocation from other threads using this
> + * files_struct, despite holding ->file_lock.
> + *
> + * alloc_fd() might have already claimed a slot, while fd_install()
> + * did not populate it yet. Note the latter operates locklessly, so
> + * the file can show up as we are walking the array below.
> + *
> + * At the same time we know no files will disappear as all other
> + * operations take the lock.
> + *
> + * Instead of trying to placate userspace racing with itself, we
> + * ref the file if we see it and mark the fd slot as unused otherwise.
> + */
> for (i = open_files; i != 0; i--) {
> - struct file *f = *old_fds++;
> + struct file *f = rcu_access_pointer(*old_fds++);

sigh, that happens to work but is technically bogus -- I thought I did
rcu_deference, but instead had rcu_access_pointer in my fingers from
the assert thing. Thanks for Mathieu for noticing.

That is to say the patch has to s/rcu_access_pointer/rcu_dereference.

However, willy suggested also adding the check. So perhaps this can
instead use the _check variant with lockdep_is_held(&fdt->file_lock)
as the argument.

I don't have an opinion on this bit -- the accesses are next to the
lock acquire, so perhaps this only serves an uglifier.

That said, if you want the assert, I'll post a v2. Otherwise please
run the sed :->

> if (f) {
> get_file(f);
> } else {
> - /*
> - * The fd may be claimed in the fd bitmap but not yet
> - * instantiated in the files array if a sibling thread
> - * is partway through open(). So make sure that this
> - * fd is available to the new process.
> - */
> __clear_open_fd(open_files - i, new_fdt);
> }
> rcu_assign_pointer(*new_fds++, f);
> @@ -684,7 +692,7 @@ struct file *file_close_fd_locked(struct files_struct *files, unsigned fd)
> return NULL;
>
> fd = array_index_nospec(fd, fdt->max_fds);
> - file = fdt->fd[fd];
> + file = rcu_access_pointer(fdt->fd[fd]);
> if (file) {
> rcu_assign_pointer(fdt->fd[fd], NULL);
> __put_unused_fd(files, fd);
> @@ -1252,7 +1260,7 @@ __releases(&files->file_lock)
> */
> fdt = files_fdtable(files);
> fd = array_index_nospec(fd, fdt->max_fds);
> - tofree = fdt->fd[fd];
> + tofree = rcu_access_pointer(fdt->fd[fd]);
> if (!tofree && fd_is_open(fd, fdt))
> goto Ebusy;
> get_file(file);
> --
> 2.43.0
>


--
Mateusz Guzik <mjguzik gmail.com>