Re: [PATCH] vfs: don't unnecessarily clone write access for writable fds
From: Eric Biggers
Date: Wed Jun 10 2020 - 22:30:05 EST
On Wed, Jun 10, 2020 at 06:49:45PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@xxxxxxxxxx>
>
> There's no need for mnt_want_write_file() to clone a write reference to
> the mount when the file is already open for writing, provided that
> mnt_drop_write_file() is changed to conditionally drop the reference.
>
> We seem to have ended up in the current situation because
> mnt_want_write_file() used to be paired with mnt_drop_write(), due to
> mnt_drop_write_file() not having been added yet. So originally
> mnt_want_write_file() did have to always take a reference.
>
> But later mnt_drop_write_file() was added, and all callers of
> mnt_want_write_file() were paired with it. This makes the compatibility
> between mnt_want_write_file() and mnt_drop_write() no longer necessary.
>
> Therefore, make __mnt_want_write_file() and __mnt_drop_write_file() be
> no-ops on files already open for writing. This removes the only caller
> of mnt_clone_write(), so remove that too.
>
> Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx>
> ---
> fs/namespace.c | 43 ++++++++++---------------------------------
> include/linux/mount.h | 1 -
> 2 files changed, 10 insertions(+), 34 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7cd64240916573..7e78c7ae4ab34d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -359,51 +359,27 @@ int mnt_want_write(struct vfsmount *m)
> }
> EXPORT_SYMBOL_GPL(mnt_want_write);
>
> -/**
> - * mnt_clone_write - get write access to a mount
> - * @mnt: the mount on which to take a write
> - *
> - * This is effectively like mnt_want_write, except
> - * it must only be used to take an extra write reference
> - * on a mountpoint that we already know has a write reference
> - * on it. This allows some optimisation.
> - *
> - * After finished, mnt_drop_write must be called as usual to
> - * drop the reference.
> - */
> -int mnt_clone_write(struct vfsmount *mnt)
> -{
> - /* superblock may be r/o */
> - if (__mnt_is_readonly(mnt))
> - return -EROFS;
> - preempt_disable();
> - mnt_inc_writers(real_mount(mnt));
> - preempt_enable();
> - return 0;
> -}
> -EXPORT_SYMBOL_GPL(mnt_clone_write);
Sorry, I think I missed something -- the __mnt_is_readonly() check should be
kept because there are cases where SB_RDONLY can be set when there are writable
file descriptors. For example, ext4 with errors=remount-ro.
Interestingly though, sys_write() skips that check because it uses
file_start_write() which only does the freeze protection.
- Eric