Re: [PATCH 14/20] VFS/namei: add 'inode' arg to put_link().

From: Al Viro
Date: Sat Apr 18 2015 - 04:09:28 EST


On Fri, Apr 17, 2015 at 08:09:10PM +0100, Al Viro wrote:
> On Fri, Apr 17, 2015 at 05:25:36PM +0100, Al Viro wrote:
> > On Mon, Mar 23, 2015 at 01:37:40PM +1100, NeilBrown wrote:
> > > @@ -1669,13 +1669,14 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
> > >
> > > do {
> > > struct path link = *path;
> > > + struct inode *inode = link.dentry->d_inode;
> > > void *cookie;
> > >
> > > res = follow_link(&link, nd, &cookie);
> > > if (res)
> > > break;
> > > res = walk_component(nd, path, LOOKUP_FOLLOW);
> > > - put_link(nd, &link, cookie);
> > > + put_link(nd, &link, inode, cookie);
> > > } while (res > 0);
> >
> > That's really unpleasant - it means increased stack footprint in the
> > recursion.
> >
> > Damn, maybe it's time to bite the bullet and kill the recursion completely...
> >
> > What do we really need to save across the recursive call?
> > * how far did we get in the previous pathname
> > * data needed for put_link:
> > cookie
> > link body
> > dentry of link
> > vfsmount (to pin containing fs; non-RCU) or inode (RCU)
> >
> > We are already saving link body in nameidata, so we could fatten that array.
> > It would allow flattening link_path_walk() completely - instead of
> > recursive call we would just save what needed saving and jump to the beginning
> > and on exits we'd check the depth and either return or restore the saved state
> > and jump back to just past the place where recursive call used to be.
> > It would even save quite a bit of space in the worst case. However, it would
> > blow the stack footprint in normal cases *and* blow it even worse for the
> > things that need two struct nameidata instances at once (rename(), basically).
> > 5 pointers instead of 1 pointer per level - extra 32 words on stack, i.e.
> > extra 256 bytes on 64bit. Extra 0.5Kb of stack footprint on rename() is
> > probably too much, especially since this "saved" stuff from its two nameidata
> > instances will never be used at the same time...
> >
> > Alternatively, we could just allocate about a page worth of an array when
> > the depth of nesting goes beyond 2 and put this saved stuff there - at
> > 5 pointers per level it would completely dispose of the depth of nesting
> > limit, giving us uniform "can't traverse more than 40 symlinks per pathname
> > resolution". 40 * 5 * sizeof(pointer) is what, at most 1600 bytes? So
> > even half a page would suffice for that quite comfortably...
> >
> > The question is whether we'll be able to avoid blowing the I-cache footprint
> > of link_path_walk() to hell while doing that; it feels like we should be,
> > but we'll have to see how well does that work in reality...
> >
> > I'll try to implement that (with your #3..#7 as the first steps) and see
> > how well does it work; it's obviously the next cycle fodder, but hopefully
> > in testable shape by -rc2...
>
> Hmm... Actually, right now we have 192 bytes of stack footprint per
> nesting level (amd64 allmodconfig). Which means that simply making the
> array fatter would give a clean benefit at the 3rd level of recursion (symlink
> encountered while traversing a symlink) for everything other than rename()...
> allnoconfig+64bit gives 160 bytes per level, with the same breakeven point.
>
> Interesting... It might even make sense to separate that array from
> struct nameidata and solve the rename() problem that way (current->nameidata
> would be replaced with pointer to that sucker in such variant, of course, and
> ->depth would move there). In that variant we do not get rid of nesting limit
> completely, but it would probably be simpler than the one above...

OK, right now in my tree recursion is gone, it seems to survive the
testing *and* stack footprint of link_path_walk() is 432 bytes. Less than the
mainline with two nested symlinks, and I definitely see how to trim that down
by another 64 bytes, which would put us within a spitting distance from what
the mainline gets with a single symlink encountered in the middle of a
pathname. It still needs more massage (link_path_walk() is ugly as hell
right now), but I see how to clean it up, and porting the rest of Neil's
RCU follow_link series on top of that shouldn't be hard. Obviously next cycle
fodder, but if everything works out, we'll get serious stack footprint
reduction *and* not falling out of lazy pathwalk whenever we run into a symlink.

I've dumped the current branch in vfs.git#link_path_walk; more after
I get some sleep...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/