Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint

From: Al Viro
Date: Wed Apr 22 2015 - 17:06:03 EST


On Wed, Apr 22, 2015 at 09:12:38PM +0100, Al Viro wrote:
> On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote:
> > And one more: may_follow_link() is now potentially oopsable. Look: suppose
> > we've reached the link in RCU mode, just as it got unlinked. link->dentry
> > has become negative and may_follow_link() steps into
> > /* Allowed if owner and follower match. */
> > inode = link->dentry->d_inode;
> > if (uid_eq(current_cred()->fsuid, inode->i_uid))
> > return 0;
> > Oops... Incidentally, I suspect that your __read_seqcount_retry() in
> > follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
> > needed?
> >
> > FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
> > get_link(), and passed inode to it as an explicit argument. And passed it
> > to may_follow_link() as well...
>
> Hrm... You know, something really weird is going on here. Where are
> you setting nd->seq? I don't see anything in follow_link() doing that.
> And nd->seq _definitely_ needs setting if you want to stay in RCU mode -
> at that point it matches the dentry of symlink, not that of nd->path
> (== parent directory). Neil, could you tell me which kernel you'd been
> testing (ideally - commit ID is a public git tree), what config and what
> tests had those been?

FWIW, there's a wart that had been annoying me for quite a while, and it
might be related to dealing with that properly. Namely, walk_component()
calling conventions. We have walk_component(nd, &path, follow), which can
* return -E..., and leave us with pathwalk terminated; path contains
junk, and so does nd->path.
* return 0, update nd->path, nd->inode and nd->seq. The contents
of path is in undefined state - it might be unchanged, it might be equal to
nd->path (and not pinned down, RCU mode or not). In any case, callers do
not touch it afterwards. That's the normal case.
* return 1, update nd->seq, leave nd->path and nd->inode unchanged and
set path pointing to our symlink. nd->seq matches path, not nd->path.

In all cases the original contents of path is ignored - it's purely 'out'
parameter, but compiler can't establish that on its own; it _might_ be
left untouched. In all cases when its contents survives we don't look at
it afterwards, but proving that requires a non-trivial analysis.

And in case when we return 1 (== symlink to be followed), we bugger nd->seq.
It's left as we need it for unlazy_walk() (and after unlazy_walk() we don't
care about it at all), so currently everything works, but if we want to
stay in RCU mode for symlink traversal, we _will_ need ->d_seq of parent
directory.

I wonder if the right way to solve that would be to drop the path argument
entirely and store the bugger in nameidata. As in
union {
struct qstr last;
struct path link;
};
...
union {
int last_type;
unsigned link_seq;
};
in struct nameidata. We never need both at the same time; after
walk_component() (or its analogue in do_last()) we don't need the component
name anymore. That way walk_component() would not trash nd->seq when
finding a symlink...

It would also shrink the stack footprint a bit - local struct path next
in link_path_walk() would be gone, along with the same thing in path_lookupat()
and friends. Not a lot of win (4 pointers total), but it might be enough
to excuse putting ->d_seq of root in there, along with ->link.dentry->d_inode,
to avoid rechecking its ->d_seq. As the matter of fact, we do have an
odd number of 32bit fields in there, so ->d_seq of root would fit nicely...

Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/