Re: [PATCH v2] vfs: shave work on failed file open

From: Al Viro
Date: Mon Oct 09 2023 - 23:06:26 EST


On Sat, Sep 30, 2023 at 11:04:20AM +0200, Christian Brauner wrote:
> +On newer kernels rcu based file lookup has been switched to rely on
> +SLAB_TYPESAFE_BY_RCU instead of call_rcu(). It isn't sufficient anymore to just
> +acquire a reference to the file in question under rcu using
> +atomic_long_inc_not_zero() since the file might have already been recycled and
> +someone else might have bumped the reference. In other words, the caller might
> +see reference count bumps from newer users. For this is reason it is necessary
> +to verify that the pointer is the same before and after the reference count
> +increment. This pattern can be seen in get_file_rcu() and __files_get_rcu().
> +
> +In addition, it isn't possible to access or check fields in struct file without
> +first aqcuiring a reference on it. Not doing that was always very dodgy and it
> +was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU
> +it is necessary that callers first acquire a reference under rcu or they must
> +hold the files_lock of the fdtable. Failing to do either one of this is a bug.

Trivial correction: the last paragraph applies only to rcu lookups - something
like
spin_lock(&files->file_lock);
fdt = files_fdtable(files);
if (close->fd >= fdt->max_fds) {
spin_unlock(&files->file_lock);
goto err;
}
file = rcu_dereference_protected(fdt->fd[close->fd],
lockdep_is_held(&files->file_lock));
if (!file || io_is_uring_fops(file)) {
^^^^^^^^^^^^^^^^^^^^^ fetches file->f_op
spin_unlock(&files->file_lock);
goto err;
}
...

should be still valid. As written, the reference to "rcu based file lookup"
is buried in the previous paragraph and it's not obvious that it applies to
the last one as well. Incidentally, I would probably turn that fragment
(in io_uring/openclose.c:io_close()) into
spin_lock(&files->file_lock);
file = files_lookup_fd_locked(files, close->fd);
if (!file || io_is_uring_fops(file)) {
spin_unlock(&files->file_lock);
goto err;
}
...

> diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c
> index 1a587618015c..5e157f48995e 100644
> --- a/arch/powerpc/platforms/cell/spufs/coredump.c
> +++ b/arch/powerpc/platforms/cell/spufs/coredump.c
> @@ -74,10 +74,13 @@ static struct spu_context *coredump_next_context(int *fd)
> *fd = n - 1;
>
> rcu_read_lock();
> - file = lookup_fd_rcu(*fd);
> - ctx = SPUFS_I(file_inode(file))->i_ctx;
> - get_spu_context(ctx);
> + file = lookup_fdget_rcu(*fd);
> rcu_read_unlock();
> + if (file) {
> + ctx = SPUFS_I(file_inode(file))->i_ctx;
> + get_spu_context(ctx);
> + fput(file);
> + }

Well... Here we should have descriptor table unshared, and we really
do rely upon that - we expect the file we'd found to have been a spufs
one *and* to have stayed that way. So if anyone could change the
descriptor table behind our back, we'd be FUBAR.