Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint

From: Al Viro
Date: Thu Apr 23 2015 - 01:01:23 EST


On Tue, Apr 21, 2015 at 10:20:07PM +0100, Al Viro wrote:

> I agree that unlazy_walk() attempted when walking a symlink ought to fail
> with -ECHILD; we can't legitimize the symlink itself, so once we are out
> of RCU mode, there's nothing to hold the inode of symlink (and its body)
> from getting freed. Solution is wrong though; for example, when
> nested symlink occurs in the middle of a trailing one, we should *not*
> remove the flag upon leaving the nested symlink.
>
> Another unpleasant thing is that ->follow_link() saying "can't do that in
> RCU mode" ends up with restart from scratch - that actually risks to be
> worse than the mainline; there we would attempt unlazy_walk() and normally
> it would've succeed.
>
> AFAICS, the real rule is "can't unlazy if nd->last.name points into a symlink
> body and we might still need to access it"...

Actually, I'm not sure anymore. What if we have unlazy_walk() legitimize
all the symlinks we are traversing? They are visible in nd->stack, after
all... It would mean more complex unlazy_walk(), but not terribly so -
succeeding legitimize_mnt() won't block and we already deal with the
possibility of having vfsmount legitimized, only to be dropped afterwards.
The real unpleasantness here is different - it's the need to keep ->d_seq
of those dentries to tell if they can be grabbed. That's 4 more bytes per
level plus the fun with alignment. OTOH, it both avoids the fun with getting
the logics of when to bail out right *and* avoids the guaranteed restarts
when running into a symlink we can't deal with in RCU mode - we could simply
unlazy and continue in such a situation.

Hell knows... it probably means going all the way wrt dynamic (on demand)
allocation, though. Say it, keeping a couple of levels on stack and allocating
when we need more; the interesting part is in not freeing that sucker too
early. At the very least, we don't want the progression through RCU/normal/
revalidate-everything modes to trigger allocation/freeing on each step; the
nesting depth is going to be the same every time. That's not hard to do...

I'm about to fall asleep right now, so all of the above might very well be
complete hogwash; I'll look into it when I wake up. If anyone has any
comments (including "Al, you are nuts", but something more specific would
be more interesting), please reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/