Re: init's children list is long and slows reaping children.

From: Eric W. Biederman
Date: Fri Apr 06 2007 - 14:04:11 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> On 04/06, Eric W. Biederman wrote:
>>
>> Thinking about it I do agree with Linus that two lists sounds like the
>> right solution because it ensures we always have O(1) time when
>> waiting for a zombie.
>
> Well. I bet this will be painful, and will uglify the code even more.
>
> do_wait() has to iterate over 2 lists, __ptrace_unlink() should check
> ->exit_state, etc...

Well I would prefer to iterate over 2 lists as opposed to N lists that
we have now.

The __ptrace_unlink issue sounds moderately valid although a ptraced
zombie is down right peculiar.

>> I'd like to place the list head for the zombie
>> list in the signal_struct and not in the task_struct so our
>> performance continues to be O(1) when we have a threaded process.
>
> Sure. It would be nice to move ->children into signal_struct at first.
> Except this change breaks (in fact fixes) ->pdeath_signal behaviour.

Actually this should be independent of the pdeath_signal issue.
As long as we record pdeath_signal per task_struct we can continue
to implement the existing semantics.

Although we really should fix pdeath_signal and push the patch to
-mm. We only didn't do that because the original patch was a
bug fix for stable kernels, and we didn't have time to verify fixing
pdeath_signal would not break anything else.

>> The big benefit of the zombie list over your proposed list reordering
>> is that waitpid can return immediately when we don't have zombies to
>> wait for, but we have lots of children.
>
> TASK_TRACED/TASK_STOPPED ?

Well it doesn't fix everything yet but it does fix the common case.
I wonder if it would make sense to have other lists as well.

Regardless of how this fixes scaling someone needs to dig in there load
up all the messy state in their head and clean up, simplify,
and optimize this mess.

We still have the stupid case where we can create unkillable
zombies from user space with a threaded init.

I think it is safe to say that this part of the code has reach the point
of fragility where it is hard to maintain. A clear sign that it is time to
refactor something.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/