Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint

From: Al Viro
Date: Tue Apr 21 2015 - 17:20:23 EST


On Tue, Apr 21, 2015 at 04:45:04PM +0100, Al Viro wrote:
> On Tue, Apr 21, 2015 at 05:12:01PM +0200, Richard Weinberger wrote:
>
> > I'm pretty sure we can kill it. I had the plan to rip it out during this merge window
> > along with other broken UML stuff but I was too late to ask on the UML mailinglist
> > if someone is using it (which I really doubt).
> > So, let's kill it with v4.2.
>
> Let's do it. Then ->put_link() is left in an interesting situation - *all*
> instances only use the 'cookie' argument...

OK, so here's what we have:
* a lot of filesystems are using page_follow_link_light(); for RCU
mode they should simply look for page and if it's there and uptodate, that's
it - just grab a reference and be done. If it's not uptodate - oh, well,
fallback to non-RCU mode. Corresponding ->put_link() doesn't give a damn
which inode or dentry it is - it's just page_cache_release() (we need to
get rid of that kmap() crap anyway).
* a lot of fast symlinks are using only inode; no ->put_link(),
no blocking operations, etc. No problem at all.
* shmem would probably want something similar to what
page_follow_link_light() would be doing for RCU case.
* befs: should switch to page_follow_link_light(); just a matter of
giving it proper ->readpage().
* NFS: probably as in Neil's series, except that we really ought to
add a helper for what page_follow_link_light() would do in RCU case, rather
than open-coding it here (and, again, kmap/kunmap crap should go)
* /proc/self and its per-thread ilk: just do GFP_ATOMIC allocation for
RCU case (and handle failure as -ECHILD rather than usual -ENOMEM).
* proc_symlink() stuff: uses only inode, nothing blocking, no problem.
* 9p, cifs and fuse: those always query server on ->follow_link();
-ECHILD and be done with that. _IF_ they want some kind of caching, they can
do as NFS does. hostfs is that way too.
* gfs2: _probably_ want to bugger off with -ECHILD; OTOH, ocfs2
uses page_follow_link_light(), maybe correctly, maybe not, and it ought
to have similar issues...
* kernfs, configfs: -ECHILD. And git rm is _very_ tempting after
reading that code...
* lustre: hell knows, maybe always -ECHILD, maybe something like NFS.
* XFS: see above.
* hppfs: agreed to kill it off
* autofs: not sure; it would be almost the usual fast symlink, if not
for the fact that it marks an object reached from dentry as "used now".
With RCU pathwalk it's _probably_ harmless, but I'd like a confirmation from
autofs folks.
* /proc/*/ns/*: in theory, we might make it handle RCU mode, but
it's probably easier to say "just bugger off"
* /proc/*/fd/*, /proc/*/exe, /proc/*/cwd, /proc/*/root: in principle
doable, but not without serious massage.
* /proc/*/map_files/*: -ECHILD.
* overlayfs: usual "use GFP_ATOMIC in RCU mode, treat failures as
-ECHILD".
* ecryptfs: -ECHILD (and its use of ->readlink() is fishy, IMO).

I agree that unlazy_walk() attempted when walking a symlink ought to fail
with -ECHILD; we can't legitimize the symlink itself, so once we are out
of RCU mode, there's nothing to hold the inode of symlink (and its body)
from getting freed. Solution is wrong though; for example, when
nested symlink occurs in the middle of a trailing one, we should *not*
remove the flag upon leaving the nested symlink.

Another unpleasant thing is that ->follow_link() saying "can't do that in
RCU mode" ends up with restart from scratch - that actually risks to be
worse than the mainline; there we would attempt unlazy_walk() and normally
it would've succeed.

AFAICS, the real rule is "can't unlazy if nd->last.name points into a symlink
body and we might still need to access it"...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/