Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

From: Al Viro
Date: Wed May 13 2015 - 18:25:40 EST


More on top of the current vfs.git#for-next (== the posted patchset
with a couple of fixes): more fs/namei.c reorganization and stack footprint
reduction (below 1Kb now). One interesting piece of that is that we don't
touch current->fs->lock anymore - unlazy_walk() used to, but now we can
get rid of that.

FWIW, at that point I'm starting to seriously look into a primitive
that would take the usual dfd+name+flags and (path x inode x bool -> int)
callback (since we don't have closures, it'd have to be
int filename_apply(int dfd, struct filename *name, unsigned flags,
int (*act)(struct path *path,
struct inode *inode,
bool may_block,
void *ctx),
void *ctx);
) with lookup done and if it ends up at something positive, act() called
for it. If we end up reaching the very end in RCU mode, act() gets called
with false as the third argument, _without_ dropping rcu_read_lock() or
grabbing references. It may return -ECHILD, in which case we'll unlazy
and call it again with may_block being true; if it does *not* return
-ECHILD, we'll check for mount lock and d_seq still being valid. If they
are, we are done, if not - restart the lookup from scratch in non-lazy mode.

Basically, that's your "could we get stat(2) without ever dirtying
anything shared?" thing, except that it's might be applicable to some of the
getxattr(), statfs(), access(), listxattr() and readlink() as well.
The obstacles for stat() are
* ->d_weak_revalidate() needs to be taught about being called in
RCU mode. Not a problem - flags are already passed and one of two instances
is already checking for LOOKUP_RCU (what with being ->d_revalidate() at the
same time). That one applies to all of them, not just stat().
* Linux S&M with its usual habit of sticking hooks into every
orifice out there (and if there hadn't been one, the hook still goes in,
of course). In this case it's not just selinux, as with follow_link -
apparmor, tomoyo and smack are also there. selinux one looks like it could
be made to work if given an inode and may_block in addition to struct path;
the rest... no idea.
* telling ->getattr() that we are in RCU mode. And giving it
inode, of course. As the first approximation, we could live with just
the "if ->getattr isn't NULL, chicken out and return -ECHILD", but e.g.
ext4, btrfs and xfs have non-NULL ->getattr(). In this case I wonder if
adding a new method wouldn't be the right thing...

Overall, it seems to be doable, and with the results of massage
already done to fs/namei.c the PITA promises to be fairly limited. How
generic do we really want it? I mean, is e.g. access(2) (faccessat(2))
worth bothering with? Or getxattr(2), for that matter... Comments?

Anyway, additional pieces of the series follow:

namei: unlazy_walk() doesn't need to mess with current->fs anymore
lustre: kill unused macro (LOOKUP_CONTINUE)
lustre: kill unused helper
get rid of assorted nameidata-related debris
[Neil's] Documentation: remove outdated information from automount-support.txt
namei: be careful with mountpoint crossings in follow_dotdot_rcu()
namei: uninline set_root{,_rcu}()
namei: pass the struct path to store the result down into path_lookupat()
namei: move putname() call into filename_lookup()
namei: shift nameidata inside filename_lookup()
namei: make filename_lookup() reject ERR_PTR() passed as name
namei: shift nameidata down into filename_parentat()
namei: saner calling conventions for filename_create()
namei: saner calling conventions for filename_parentat()
namei: fold path_cleanup() into terminate_walk()
namei: stash dfd and name into nameidata
namei: trim do_last() arguments
inline user_path_parent()
inline user_path_create()
namei: move saved_nd pointer into struct nameidata
turn user_{path_at,path,lpath,path_dir}() into static inlines
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/