Re: main thread pthread_exit/sys_exit bug!

From: Kaz Kylheku
Date: Tue Feb 03 2009 - 14:51:55 EST


On Tue, Feb 3, 2009 at 5:33 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 02/02, Kaz Kylheku wrote:
>>
>> On Mon, Feb 2, 2009 at 12:39 PM, Kaz Kylheku <kkylheku@xxxxxxxxx> wrote:
>> > On Mon, Feb 2, 2009 at 12:17 PM, Ulrich Drepper <drepper@xxxxxxxxxx> wrote:
>> >> The userlevel context of the
>> >> thread is not usable anymore. It will have run all kinds of
>> >> destructors. The current behavior is AFAIK that the main thread won't
>> >> react to any signal anymore. That is absolutely required.
>> >
>> > Hey Ulrich,
>> >
>> > Thanks for articulating that requirement. I think it can be met by
>> > extending the patch a little bit.
>>
>> I've now done that.
>>
>> The exiting thread leader, if there are still other
>> threads alive, gets its own private signal handler array in which
>> every action is set to SIG_IGN, using the ignore_signals
>> function.
>>
>> I experimented with blocking signals, but that approach
>> breaks the test case of being able to attach GDB to the
>> exiting thread.
>>
>> As part of the patch, I found it convenient to extend the
>> incomplete sys_unshare functionality w.r.t. signal handlers,
>> rather than reinvent the wheel.
>
> This is wrong, we can not and must not unshare ->sighand.

You are right; it breaks important invariant conditions which
connect the thread group together, like the one about the
lock, et cetera. The patch goes too far: rather than simply
delaying the finalization (relatively safe), it's messing with
the shared state (risky).

Well, it doesn't bother me that that has to be thrown out.
In fact, I do not agree with the requirement that the thread
which calls pthread_exit must not respond to signals;
the original patch works for me.

I.e. in my embedded GNU/Linux distro, that requirement
doesn't exist. And since I can't find it in the Single
Unix Specification, so much for that!

Nothing in the spec says that once pthread_exit is called,
signals are stopped. This function invokes cleanup handling,
and thread-specific-storage destruction. During any of those
tasks, signals can still be happening. Any of those
tasks can easily enter into an indefinite wait. What if
a cleanup handler performs a blocking RPC to a remote
server? Well, there you are, stuck in pthread_exit,
handling signals, and not cleaning up your robust list, etc.

I also don't require robust locks to be cleaned up
instantly if they are owned by a main thread that has
called pthread_exit.

My organization is a heavy user of robust mutexes;
they protect the integrity of a large, ``real time'' database
stored in shared memory. I don't think that this would
affect us in any way. The principal concern is that
a process dies, for whatever reason, while holding locks.
It's more to recover from catastrophic failures, not from
mutex locking mistakes. If a thread locks a mutex and
doesn't release it due to bad program logic, that is a
problem whether or not that thread dies. It's not
particularly useful that we can resolve that situation
with EOWNERDEAD in the kernel when that thread
happens to die, because that's just one case where
we are lucky, so to speak.

> Yes, gdb refuses to attach to the dead thread (I didn't check
> this myself, but I think you are right). But there is nothing
> wrong here, because we can't ptrace this thread. But, gdb
> _can_ ptrace the process, and it can see it have other threads.

Face it, allowing the thread leader to exit is as wrong as doing
other stupid things to the leader, like unsharing the signal
handler.

Either way, you are breaking some little stick which is holding
up the illusion that there is a process which has threads.

> OK, if nothing else. Let's suppose your patch is correct.

Obviously, the second part isn't. I'm very happy to have
any excuse to throw this out, thanks!

The first part isn't Ulrich-compliant. That is signficant,
and duly noted; but it's not the law where I'm sitting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/