Re: main thread pthread_exit/sys_exit bug!
From: Kaz Kylheku
Date: Mon Feb 02 2009 - 15:11:16 EST
On Mon, Feb 2, 2009 at 8:56 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> Well, perhaps something like your patch makes sense.
>
> +/*
> + * A single thread is exiting, and it is the leader of the group.
> + * This coresponds to the main thread calling pthread_exit.
> + * In this case, the persists until all the other
> + * threads call pthread_exit, or someone calls exit or _exit.
> + * To implement this case, we park the thread leader in
> + * a loop which waits until the thread list becomes empty,
> + * or it receives a fatal signal.
> ^^^^^^^^^^^^^^
> The comment is not exactly right, we return on every signal, not only fatal.
Yes; that's stale text, referring to some earlier code that had been changed.
> the above is just the open-coded
> wait_event_interruptible(¤t->signal->wait_chldexit,
> thread_group_empty(current));
Terrific! I will make the change locally.
Btw, it's ``hand coded''. ``Open coding'' is what both programmers and macros
do. Hand conding is the dumb thing I did instead of using the macro to open
code it. :)
> I am worried this patch can confuse the user-space. Because, when
> the main thread does sys_exit(), the user-space has all rights
> to assume it exits ;)
Only glibc knows about sys_exit. (Or are there run-times for other language
implementations that have their own binding to sys_exit, bypassing pthreads?)
The POSIX interface used by applications is pthread_exit, and there is no
assumption there about it being an exit-like system call. It has a number of
standard-defined user-space chores to do, in fact.
> But with this patch the main thread will
> continue to handle the signals until the while group exits, I'm
> afraid libpthread.so won't be happy.
> And what if the signal handler does siglongjmp() and aborts sys_exit() ?
pthread_exit peforms cleanup unwinding (required by Unix and POSIX) and
destruction of thread-specific storage.
Glibc does this and then performs its own longjmp to a handler in the startup
code above main. At that point it's no longer correct to longjmp to any of
the frames that have been aborted by that action. It's still executing user
code; sys_exit has not been called yet, and the signal handler can go
off.
Other than that, there is actually nothing wrong with aborting sys_exit. It
hasn't done any cleanup yet through do_exit, so it can be nicely
reentered later.
I'd say that any programs that are broken by this patch are
probably ``fork in the toaster'' category anyway.
Note that programs can also abort the exit function with signals, too,
and there is nothing that can be done in the kernel about it.
>> It's a bad idea to allow the main thread to terminate. It should stick
>> around because it serves as a facade for the process as a whole. If
>> the main thread is allowed to bail all the way through do_exit, who
>> knows what kind of problems may show up because of that.
>>
>> What if one of my developers is working on a server which has called
>> pthread_exit in the main thread, and wants to attach gdb to it? Will
>> that work if the main thread (a.k.a group leader) is a defunct
>> process?
>
> And I think gdb is wrong. It can see this process has other live threads,
> and attach to them.
But if this problem is patched in this very simple way in the kernel, then gdb
installations ``Just Work'' without even being recompiled.
Probably, the only way the gdb maintainers would accept another Linux-specific
hack would be if they didn't know that a kernel patch will fix it. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/