Re: + prctl-add-pr_setget_child_reaper-to-allow-simple-process-supervision.patch added to -mm tree

From: Kay Sievers
Date: Wed Aug 17 2011 - 11:45:28 EST


On Wed, Aug 17, 2011 at 15:45, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 08/17, Kay Sievers wrote:
>> On Wed, Aug 17, 2011 at 13:55, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>> >
>> > I try to never argue with the new features. But to be honest, this
>> > doesn't look very good to me.
>> >
>> > OK, a service manager M does prctl(PR_SET_CHILD_REAPER), then it forks
>> > a service X which forks another child C and exits. Then C exits and
>> > notifies M.
>> >
>> > But. How can M know that the service X should be restarted? It only
>> > knows the pid.
>>
>> Legacy services write pid files and we read them, so we know the pid
>> to watch for. Proper services never double-fork and reparent in a
>> modern init environment.
>
> OK. So, this patch can only help to handle the legacy services?

It helps them with services that need it. It is not recommended to
double-fork ever with a modern init system, but it's historic default
and common practice, and we are not going to change that any time
soon.

> And
> the service should participate (write pid files for example). And,

This is not meant as a security feature, if that's what your asking.
It will not prevent services from doing nasty things and escape the
process that started them. But it's still a feature that today only
PID 1 and which we need for more processes.

>> > What if wait(WEXITED) succeeds because C in turn does
>> > fork + exit?
>>
>> Nothing is really doing this.
>
> OK. But this means you propose this patch to solve the very specific
> problems.

No, it's for a very common problem. But again, it's not a security feature.

> IOW, imho this doesn't look very useful "in general" to me.

It is very useful if you have an init-like daemon.

> May be we need something else instead... And iiuc you don't really
> need to change the reparenting, you only want the notification if
> the process exits.

No, we want to be the parent of the process, and we want to be the one
who reaps all the child process, not only receive some out-of-band
notifications. The sub-init is the babysitter of all the things it has
started, and that should be reflected in the parent child relation.

>> >> @@ -1296,6 +1296,8 @@ struct task_struct {
>> >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â* execve */
>> >> Â Â Â unsigned in_iowait:1;
>> >>
>> >> + Â Â /* Reparent child processes to this process instead of pid 1. */
>> >> + Â Â unsigned child_reaper:1;
>> >
>> > First of all - this is already very wrong imho. This should be
>> > per-process, not per-thread.
>>
>> What do you mean? That would go where instead?
>
> You should mark the whole process as sub-reaper, not a single thread
> which does prctl(). The parent/child relationship is process-wide.

Ok.

> If nothing else. Suppose that application does pthread_create(), the
> new thread does prctl(REAPER) and exits.

I get the (weird) idea. :)

>> >> + Â Â /* find the first ancestor which is marked as child_reaper */
>> >> + Â Â for (reaper = father->parent;
>> >> + Â Â Â Â Âreaper != &init_task && reaper != pid_ns->child_reaper;
>> >> + Â Â Â Â Âreaper = reaper->parent)
>> >
>> > This loop can never reach init_task/child_reaper and crash the kernel.
>>
>> You mean: *if* this loop can never ...?
>
> Yes.
>
>> > For example, father->parent can point to init_task's sub-thread.
>> >
>> > OTOH you shouldn't use init_task at all.
>>
>> What would we use instead?
>
> You should check ->child_reaper only. But see above, it can be multithreaded.

The main PID 1 from the system has no ->child_reaper set as far as I
see, hence we check for init_task.

>> > Also. You shouldn't do this if the sub-namespace init exits, this is
>> > wrong.
>>
>> It we find a sub-init, before the namespace PID1, why wouldn't we return it?
>
> Ah, I meant pid_ns->child_reaper, not task->child_reaper.
>
> If pid_ns->child_reaper exits we should never try to "reparent" its
> children, see zap_pid_ns_processes() in particular. IOW, this should
> go into the "else" branch of "if (pid_ns->child_reaper == father)"

I don't understand this. If we find a marked task->child_reaper
_before_ we find a pid_ns->child_reaper in the chain of parents, why
wouldn't we return it?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/