[GIT PULL] overlayfs update for 4.18

From: Miklos Szeredi
Date: Fri Jun 08 2018 - 08:13:43 EST


Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git tags/ovl-update-4.18

This contains two new features:

1) Stack file operations: this allows removal of several hacks from the
VFS, proper interaction of read-only open files with copy-up,
possibility to implement fs modifying ioctls properly, and others.

2) Metadata only copy-up: when file is on lower layer and only metadata is
modified (except size) then only copy up the metadata and continue to
use the data from the lower file.

The series starts with a cleanup of the internal dedupe API. There's some
late discussion on details (should vfs limit the size of a dedepe request,
and if yes, how much). I've ignored it for this pull request, it can
easily be fixed later.

Other pain point: overlay doesn't want to double account open files (due to
stacking) for fear of breaking existing setups. So added infrastruture
that allows to skip accounting an open file in nr_files. I don't much like
this, but can't see any other way of keeping backward compatibility.

There are two conflicts when merging, attaching my resolution.

Thanks,
Miklos

---
Miklos Szeredi (37):
vfs: dedupe: return loff_t
vfs: dedupe: rationalize args
vfs: dedupe: extract helper for a single dedup
vfs: add path_open()
vfs: optionally don't account file in nr_files
vfs: export vfs_ioctl() to modules
vfs: export vfs_dedupe_file_range_one() to modules
ovl: copy up times
ovl: copy up inode flags
Revert "Revert "ovl: get_write_access() in truncate""
ovl: copy up file size as well
ovl: deal with overlay files in ovl_d_real()
ovl: stack file ops
ovl: add helper to return real file
ovl: add ovl_read_iter()
ovl: add ovl_write_iter()
ovl: add ovl_fsync()
ovl: add ovl_mmap()
ovl: add ovl_fallocate()
ovl: add lsattr/chattr support
ovl: add ovl_fiemap()
ovl: add O_DIRECT support
ovl: add reflink/copyfile/dedup support
vfs: don't open real
ovl: obsolete "check_copy_up" module option
ovl: fix documentation of non-standard behavior
vfs: simplify dentry_open()
Revert "ovl: fix may_write_real() for overlayfs directories"
Revert "ovl: don't allow writing ioctl on lower layer"
vfs: fix freeze protection in mnt_want_write_file() for overlayfs
Revert "ovl: fix relatime for directories"
Revert "vfs: update ovl inode before relatime check"
Revert "vfs: add flags to d_real()"
Revert "vfs: do get_write_access() on upper layer of overlayfs"
Partially revert "locks: fix file locking on overlayfs"
Revert "fsnotify: support overlayfs"
vfs: remove open_flags from d_real()

Vivek Goyal (28):
ovl: Initialize ovl_inode->redirect in ovl_get_inode()
ovl: Move the copy up helpers to copy_up.c
ovl: Provide a mount option metacopy=on/off for metadata copyup
ovl: During copy up, first copy up metadata and then data
ovl: Copy up only metadata during copy up where it makes sense
ovl: Add helper ovl_already_copied_up()
ovl: A new xattr OVL_XATTR_METACOPY for file on upper
ovl: Use out_err instead of out_nomem
ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
ovl: Copy up meta inode data from lowest data inode
ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
ovl: Fix ovl_getattr() to get number of blocks from lower
ovl: Store lower data inode in ovl_inode
ovl: Add helper ovl_inode_realdata()
ovl: Open file with data except for the case of fsync
ovl: Do not expose metacopy only dentry from d_real()
ovl: Move some dir related ovl_lookup_single() code in else block
ovl: Check redirects for metacopy files
ovl: Treat metacopy dentries as type OVL_PATH_MERGE
ovl: Add an inode flag OVL_CONST_INO
ovl: Do not set dentry type ORIGIN for broken hardlinks
ovl: Set redirect on metacopy files upon rename
ovl: Set redirect on upper inode when it is linked
ovl: Check redirect on index as well
ovl: add helper to force data copy-up
ovl: Do not do metadata only copy-up for truncate operation
ovl: Do not do metacopy only for ioctl modifying file attr
ovl: Enable metadata only feature

---
Documentation/filesystems/Locking | 3 +-
Documentation/filesystems/overlayfs.txt | 90 ++++--
Documentation/filesystems/vfs.txt | 16 +-
fs/btrfs/ctree.h | 5 +-
fs/btrfs/ioctl.c | 7 +-
fs/file_table.c | 13 +-
fs/inode.c | 46 +--
fs/internal.h | 17 +-
fs/ioctl.c | 1 +
fs/locks.c | 20 +-
fs/namei.c | 2 +-
fs/namespace.c | 69 +----
fs/ocfs2/file.c | 10 +-
fs/open.c | 87 +++---
fs/overlayfs/Kconfig | 19 ++
fs/overlayfs/Makefile | 4 +-
fs/overlayfs/copy_up.c | 190 ++++++++----
fs/overlayfs/dir.c | 105 +++++--
fs/overlayfs/export.c | 3 +
fs/overlayfs/file.c | 508 ++++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 175 +++++++----
fs/overlayfs/namei.c | 195 +++++++-----
fs/overlayfs/overlayfs.h | 47 ++-
fs/overlayfs/ovl_entry.h | 6 +-
fs/overlayfs/super.c | 103 ++++---
fs/overlayfs/util.c | 252 +++++++++++++++-
fs/read_write.c | 91 +++---
fs/xattr.c | 9 +-
fs/xfs/xfs_file.c | 8 +-
include/linux/dcache.h | 15 +-
include/linux/fs.h | 31 +-
include/linux/fsnotify.h | 14 +-
include/uapi/linux/fs.h | 1 -
33 files changed, 1590 insertions(+), 572 deletions(-)
create mode 100644 fs/overlayfs/file.c
diff --cc fs/btrfs/ioctl.c
index d29992f7dc63,70eac76804df..000000000000
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@@ -3596,14 -3192,20 +3596,15 @@@ out_free
return ret;
}

- ssize_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
- struct file *dst_file, u64 dst_loff)
-#define BTRFS_MAX_DEDUPE_LEN SZ_16M
-
+ loff_t btrfs_dedupe_file_range(struct file *src_file, loff_t loff,
+ struct file *dst_file, loff_t dst_loff,
+ loff_t olen)
{
struct inode *src = file_inode(src_file);
struct inode *dst = file_inode(dst_file);
u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
- ssize_t res;
+ int res;

- if (olen > BTRFS_MAX_DEDUPE_LEN)
- olen = BTRFS_MAX_DEDUPE_LEN;
-
if (WARN_ON_ONCE(bs < PAGE_SIZE)) {
/*
* Btrfs does not support blocksize < page_size. As a
diff --cc fs/read_write.c
index e83bd9744b5d,1ff18ea56584..000000000000
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@@ -2021,46 -2055,21 +2055,21 @@@ int vfs_dedupe_file_range(struct file *

if (info->reserved) {
info->status = -EINVAL;
- } else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE))) {
- info->status = -EINVAL;
- } else if (file->f_path.mnt != dst_file->f_path.mnt) {
- info->status = -EXDEV;
- } else if (S_ISDIR(dst->i_mode)) {
- info->status = -EISDIR;
- } else if (dst_file->f_op->dedupe_file_range == NULL) {
- info->status = -EINVAL;
- } else {
- deduped = dst_file->f_op->dedupe_file_range(file, off,
- len, dst_file,
- info->dest_offset);
- if (deduped == -EBADE)
- info->status = FILE_DEDUPE_RANGE_DIFFERS;
- else if (deduped < 0)
- info->status = deduped;
- else
- info->bytes_deduped += deduped;
- goto next_loop;
++ goto next_fdput;
}

- next_file:
- mnt_drop_write_file(dst_file);
+ deduped = vfs_dedupe_file_range_one(file, off, dst_file,
+ info->dest_offset, len);
+ if (deduped == -EBADE)
+ info->status = FILE_DEDUPE_RANGE_DIFFERS;
+ else if (deduped < 0)
+ info->status = deduped;
+ else
+ info->bytes_deduped += deduped;
+
-next_loop:
+next_fdput:
fdput(dst_fd);
-
+next_loop:
if (fatal_signal_pending(current))
goto out;
}