Re: [PATCH v9 05/10] namei: O_BENEATH-style path resolution flags

From: Aleksa Sarai
Date: Sun Jul 14 2019 - 03:01:34 EST

On 2019-07-13, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Jul 12, 2019 at 04:00:26PM +0100, Al Viro wrote:
> > On Fri, Jul 12, 2019 at 02:25:53PM +0100, Al Viro wrote:
> >
> > > if (flags & LOOKUP_BENEATH) {
> > > nd->root = nd->path;
> > > if (!(flags & LOOKUP_RCU))
> > > path_get(&nd->root);
> > > else
> > > nd->root_seq = nd->seq;
> >
> > BTW, this assignment is needed for LOOKUP_RCU case. Without it
> > you are pretty much guaranteed that lazy pathwalk will fail,
> > when it comes to complete_walk().
> >
> > Speaking of which, what would happen if LOOKUP_ROOT/LOOKUP_BENEATH
> > combination would someday get passed?
> I don't understand what's going on with ->r_seq in there - your
> call of path_is_under() is after having (re-)sampled rename_lock,
> but if that was the only .. in there, who's going to recheck
> the value? For that matter, what's to guarantee that the thing
> won't get moved just as you are returning from handle_dots()?
> IOW, what does LOOKUP_IN_ROOT guarantee for caller (openat2())?

I tried to explain this in the commit message for "namei: aggressively
check for nd->root escape on ".." resolution", but I probably could've
explained it better.

The basic property being guaranteed by LOOKUP_IN_ROOT is that it will
not result in resolution of a path component which was not inside the
root of the dirfd tree at some point during resolution (and that all
absolute symlink and ".." resolution will be done relative to the
dirfd). This may smell slightly of chroot(2), because unfortunately it
is a similar concept -- the reason for this is to allow for a more
efficient way to safely resolve paths inside a rootfs than spawning a
separate process to then pass back the fd to the caller.

We don't want to do a path_is_under() check for every ".." (otherwise
lookups have a quadratic slowdown when doing many ".."s), so I instead
only do a check after a rename or a mount (which are the only operations
which could change what ".." points to). And since we do the
path_is_under() check if m_seq or r_seq need a retry, we can re-take

The main reason why I don't re-check path_is_under() after handle_dots()
is that there is no way to be sure that a racing rename didn't happen
after your last path_is_under() check. The rename could happen after the
syscall returns, after all.

So, the main purpose of the check is to make sure that a ".."s after a
rename doesn't result in an escape. If the rename happens after we've
traversed through a ".." that means that the ".." was inside the root in
the first place (a root ".." is handled by follow_dotdot). If the rename
happens after we've gone through handle_dots() and there is no
subsequent ".." then to userspace it looks identical to the rename
occurring after the syscall has returned. If there is a subsequent ".."
after a racing rename then we may have moved into a path that wasn't
path_is_under() and so we have to check it.

The only way I could see you could solve the race completely is if you
had a way for userspace to lock things from being able to be renamed (or
MS_MOVE'd). And that feels like a really bad idea to me.

[+]: You asked why don't I re-take m_seq. The reason is that I don't
fully understand all the other m_seq checks being done during
resolution, and we aren't definitely doing them all in
handle_dots(). So I assumed re-taking it could result in me
breaking RCU-walk which obviously would be bad. Since I am the only
thing using nd->r_seq, I can re-take it without issue.

Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH

Attachment: signature.asc
Description: PGP signature