Re: CLONE_PARENT after setns(CLONE_NEWPID)

From: Oleg Nesterov
Date: Wed Nov 06 2013 - 15:05:36 EST


On 11/06, Andy Lutomirski wrote:
>
> On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> > Hi Serge,
> >
> > On 11/06, Serge Hallyn wrote:
> >>
> >> Hi Oleg,
> >>
> >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> >> breaks lxc-attach in 3.12. That code forks a child which does
> >> setns() and then does a clone(CLONE_PARENT). That way the
> >> grandchild can be in the right namespaces (which the child was
> >> not) and be a child of the original task, which is the monitor.
> >
> > Thanks...
> >
> > Yes, this is what 40a0d32d1ea explicitly tries to disallow.
> >
> >> Is there a real danger in allowing CLONE_PARENT
> >> when current->nsproxy->pidns_for_children is not our pidns,
> >> or was this done out of an "over-abundance of caution"?
> >
> > I am not sure... This all was based on the long discussion, and
> > it was decided that the CLONE_PARENT check should be consistent
> > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
> >
> >> Can we
> >> safely revert that new extra check?
> >
> > Well, usually we do not break user-space, but I am not sure about
> > this case...
>
> Presumably if we allow this, then we should also allow
> clone(CLONE_NEWPID | CLONE_PARENT).

Yes, agreed. but this means another change, this was forbidden even
before this commit.

> This seems a little odd, but off
> the top of my head it doesn't seem obviously dangerous.

I do not see any "strong" reason too. At least right now... But I would
say that it would be better to not allow to abuse ->real_parent, it
doesn't event know about the new child (if CLONE_PARENT).

> (Why were we worried about this in the first place? The comment says
> that we don't want signal handlers or thread groups to span
> namespaces, but CLONE_PARENT has nothing to do with that.)

it also says "or parent" ;)

> I feel like I'm rehashing something ancient, but shouldn't that code just be:
>
> if (clone_flags & CLONE_VM) {
> // check for unsharing namespaces

No, this will break vfork().

And note that CLONE_SIGHAND was disallowed "just in case" and because
do_fork() had a similar check. Sharing the signal handlers is fine afaics.

>From e79f525e:

We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it
would be safer to not do this. The current check denies CLONE_SIGHAND
implicitely and there is no reason to change this.

And I disagree with

Eric said "CLONE_SIGHAND is fine. CLONE_THREAD would be even better.
Having shared signal handling between two different pid namespaces is
the case that we are fundamentally guarding against."

added during the merging ;) Or perhaps I misunderstood the text above. But this
all is off-topic.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/