Re: [PATCH v9 6/6] mm,thp: avoid writes to file with THP in pagecache

From: Johannes Weiner
Date: Wed Jul 10 2019 - 15:11:21 EST


On Mon, Jun 24, 2019 at 05:12:46PM -0700, Song Liu wrote:
> In previous patch, an application could put part of its text section in
> THP via madvise(). These THPs will be protected from writes when the
> application is still running (TXTBSY). However, after the application
> exits, the file is available for writes.
>
> This patch avoids writes to file THP by dropping page cache for the file
> when the file is open for write. A new counter nr_thps is added to struct
> address_space. In do_last(), if the file is open for write and nr_thps
> is non-zero, we drop page cache for the whole file.
>
> Reported-by: kbuild test robot <lkp@xxxxxxxxx>
> Signed-off-by: Song Liu <songliubraving@xxxxxx>
> ---
> fs/inode.c | 3 +++
> fs/namei.c | 23 ++++++++++++++++++++++-
> include/linux/fs.h | 32 ++++++++++++++++++++++++++++++++
> mm/filemap.c | 1 +
> mm/khugepaged.c | 4 +++-
> 5 files changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index df6542ec3b88..518113a4e219 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -181,6 +181,9 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> mapping->flags = 0;
> mapping->wb_err = 0;
> atomic_set(&mapping->i_mmap_writable, 0);
> +#ifdef CONFIG_READ_ONLY_THP_FOR_FS
> + atomic_set(&mapping->nr_thps, 0);
> +#endif
> mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
> mapping->private_data = NULL;
> mapping->writeback_index = 0;
> diff --git a/fs/namei.c b/fs/namei.c
> index 20831c2fbb34..3d95e94029cc 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3249,6 +3249,23 @@ static int lookup_open(struct nameidata *nd, struct path *path,
> return error;
> }
>
> +/*
> + * The file is open for write, so it is not mmapped with VM_DENYWRITE. If
> + * it still has THP in page cache, drop the whole file from pagecache
> + * before processing writes. This helps us avoid handling write back of
> + * THP for now.
> + */
> +static inline void release_file_thp(struct file *file)
> +{
> + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) {
> + struct inode *inode = file_inode(file);
> +
> + if (inode_is_open_for_write(inode) &&
> + filemap_nr_thps(inode->i_mapping))
> + truncate_pagecache(inode, 0);
> + }
> +}
> +
> /*
> * Handle the last step of open()
> */
> @@ -3418,7 +3435,11 @@ static int do_last(struct nameidata *nd,
> goto out;
> opened:
> error = ima_file_check(file, op->acc_mode);
> - if (!error && will_truncate)
> + if (error)
> + goto out;
> +
> + release_file_thp(file);
> + if (will_truncate)
> error = handle_truncate(file);

This would seem better placed in do_dentry_open(), where we're done
with the namespace operation and actually work against the inode.

Something roughly like this?

diff --git a/fs/open.c b/fs/open.c
index b5b80469b93d..cae893edbab6 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -799,6 +799,11 @@ static int do_dentry_open(struct file *f,
if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO)
return -EINVAL;
}
+
+ /* XXX: Huge page cache doesn't support writing yet */
+ if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
+ truncate_pagecache(inode, 0);
+
return 0;

cleanup_all: