[RFC] ->get_link(), ->put_link() and cookies

From: Al Viro
Date: Fri Jan 01 2016 - 01:37:18 EST


In cases when we need to pin the symlink body in some manner, we
need to undo whatever we'd done once the caller is done with the body.
That went through several variants, the latest (in -next right now) being
"have non-NULL ->put_link() and leave an argument for it in void *cookie,
address of which is passed to ->get_link()".

Linus suggested getting rid of the cookie thing by deriving it
from the address of symlink body. That breaks in some cases, though -
procfs and lustre, to name a couple. Symlink body may be not enough.
Moreover, inode argument of ->put_link() is never used.

AFAICS, the following ends up a lot cleaner:

1) struct delayed_call {
void (*fn)(void *);
void *arg;
} is defined. Primitives:
DEFINE_DELAYED_CALL(name) - similar to DEFINE_MUTEX, et.al.
set_delayed_call(p, fn, arg) - sets fn and arg
do_delayed_call(p) - if fn is set, calls fn(arg)
clear_delayed_call(p) - clears fn

2) in struct saved replace void *cookie; with struct delayed_call done;
pass its address to ->get_link() and when/if it needs something to release
the body, let it do set_delayed_call(done, fn, arg).

3) remove ->put_link entirely

4) get rid of allocations in overlayfs ->get_link() - just let the ->get_link()
of underlying fs arrange the call it needs done and that's it; we don't need
to keep a reference to inode of underlying symlink around, which removes the
reason to allocate anything in there.

5) compensation of stack footprint changes: struct saved ->inode had been used
for two things - storing the inode between the pick_link() and get_link() and
keeping it until we decide to call its ->put_link(). Since the latter is gone
now, we don't need to store that thing in nd->stack[...] - we can move it to
nd->link_inode and be done with that. That compensates the increase of
struct saved size, so we are only left with one extra word per
struct nameidata instance.

The reasons why I like it more than the variant proposed by Linus:
* it allows to avoid convolutions in things like procfs and lustre
->get_link() instances; sure, we could stash a reference to the structure
we'll need somewhere near the symlink body, use container_of(), etc., but
it's a lot more convoluted
* it doesn't need to keep an inode reference around, only for the
purpose of locating ->put_link(). Seeing that ->put_link() instances do not
give a damn about the inode, that's rather pointless (note, BTW, that all
we are promised in RCU mode is that memory of this struct inode hadn't been
reused yet).

I'd done that on top of vfs.git#work.symlinks; it's pretty much the same
as incremental I'd posted three weeks ago, except for the switch of three
instances from get_free_page/free_page_link to kmalloc/kfree_link.

It's in vfs.git#proposed.symlinks. It survives the local beating. Please,
review and comment.

Shortlog:
Al Viro (13):
switch befs long symlinks to page_symlink_operations
logfs: don't duplicate page_symlink_inode_operations
udf: don't duplicate page_symlink_inode_operations
ufs: get rid of ->setattr() for symlinks
namei: page_getlink() and page_follow_link_light() are the same thing
don't put symlink bodies in pagecache into highmem
replace ->follow_link() with new method that could stay in RCU mode
teach page_get_link() to work in RCU mode
teach shmem_get_link() to work in RCU mode
teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
teach nfs_get_link() to work in RCU mode
kill free_page_put_link()
switch ->get_link() to delayed_call, kill ->put_link()

Diffstat:
Documentation/filesystems/Locking | 6 +-
Documentation/filesystems/porting | 17 ++++
Documentation/filesystems/vfs.txt | 21 ++---
drivers/staging/lustre/lustre/llite/symlink.c | 24 ++---
fs/9p/vfs_inode.c | 24 +++--
fs/9p/vfs_inode_dotl.c | 21 +++--
fs/affs/inode.c | 1 +
fs/affs/namei.c | 1 +
fs/affs/symlink.c | 9 +-
fs/afs/inode.c | 1 +
fs/autofs4/symlink.c | 14 ++-
fs/befs/linuxvfs.c | 40 ++++----
fs/btrfs/inode.c | 5 +-
fs/ceph/inode.c | 2 +-
fs/cifs/cifsfs.c | 3 +-
fs/cifs/cifsfs.h | 5 +-
fs/cifs/link.c | 10 +-
fs/coda/cnode.c | 5 +-
fs/coda/symlink.c | 4 +-
fs/configfs/symlink.c | 22 +++--
fs/cramfs/inode.c | 1 +
fs/dcache.c | 2 +-
fs/ecryptfs/inode.c | 17 +++-
fs/efs/inode.c | 1 +
fs/efs/symlink.c | 4 +-
fs/exofs/inode.c | 1 +
fs/exofs/namei.c | 1 +
fs/ext2/inode.c | 1 +
fs/ext2/namei.c | 1 +
fs/ext2/symlink.c | 5 +-
fs/ext4/inode.c | 1 +
fs/ext4/namei.c | 1 +
fs/ext4/symlink.c | 29 +++---
fs/f2fs/inode.c | 1 +
fs/f2fs/namei.c | 31 ++++---
fs/freevxfs/vxfs_inode.c | 1 +
fs/fuse/dir.c | 17 ++--
fs/gfs2/inode.c | 19 ++--
fs/hfsplus/inode.c | 2 +
fs/hostfs/hostfs_kern.c | 22 ++---
fs/hpfs/inode.c | 1 +
fs/hpfs/namei.c | 5 +-
fs/hugetlbfs/inode.c | 1 +
fs/inode.c | 6 ++
fs/isofs/inode.c | 1 +
fs/isofs/rock.c | 4 +-
fs/jffs2/symlink.c | 2 +-
fs/jfs/inode.c | 1 +
fs/jfs/namei.c | 1 +
fs/jfs/symlink.c | 5 +-
fs/kernfs/symlink.c | 24 +++--
fs/libfs.c | 22 ++---
fs/logfs/dir.c | 9 +-
fs/logfs/inode.c | 3 +-
fs/logfs/logfs.h | 1 -
fs/minix/inode.c | 4 +-
fs/namei.c | 126 +++++++++++++-------------
fs/ncpfs/inode.c | 4 +-
fs/nfs/inode.c | 26 +++++-
fs/nfs/symlink.c | 39 +++++---
fs/nilfs2/inode.c | 1 +
fs/nilfs2/namei.c | 4 +-
fs/ocfs2/inode.c | 1 +
fs/ocfs2/namei.c | 1 +
fs/ocfs2/symlink.c | 3 +-
fs/overlayfs/inode.c | 53 ++---------
fs/proc/base.c | 24 +++--
fs/proc/inode.c | 21 +++--
fs/proc/namespaces.c | 10 +-
fs/proc/self.c | 18 ++--
fs/proc/thread_self.c | 19 ++--
fs/qnx4/inode.c | 1 +
fs/qnx6/inode.c | 1 +
fs/ramfs/inode.c | 1 +
fs/reiserfs/inode.c | 1 +
fs/reiserfs/namei.c | 4 +-
fs/romfs/super.c | 1 +
fs/squashfs/inode.c | 2 +
fs/squashfs/symlink.c | 3 +-
fs/sysv/inode.c | 4 +-
fs/ubifs/file.c | 2 +-
fs/udf/inode.c | 3 +-
fs/udf/namei.c | 8 +-
fs/udf/symlink.c | 4 +-
fs/udf/udfdecl.h | 1 -
fs/ufs/Makefile | 2 +-
fs/ufs/inode.c | 5 +-
fs/ufs/namei.c | 5 +-
fs/ufs/symlink.c | 42 ---------
fs/ufs/ufs.h | 4 -
fs/xfs/xfs_iops.c | 14 ++-
include/linux/delayed_call.h | 34 +++++++
include/linux/fs.h | 16 ++--
include/linux/nfs_fs.h | 1 +
mm/shmem.c | 48 ++++++----
95 files changed, 570 insertions(+), 470 deletions(-)
delete mode 100644 fs/ufs/symlink.c
create mode 100644 include/linux/delayed_call.h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/