core dumps from multi-threaded (kernel cloned) processes

Philip Gladstone (philip@raptor.com)
Fri, 14 Nov 1997 12:31:25 -0500


The current behaviour of Linux 2.0.x (and I think 2.1.x) when
a glibc multithreaded process crashes is not useful. [The
multithreading is implemented by means of cloning]. Linux
dumps (essentially) the initial thread rather than the crashing
thread. Why?

Well, the pthreads implementation catches the SIGCHLD from
the dying thread/process, and discovers the cause (say) SIGILL.
It then sends a SIGILL to all the other threads/processes in the
'process'
and they all die. Which one is core dumped? When the first one
fails, Linux detects that the memory map is in use by more than
one process, and so it skips generating the core file. I suspect
that this is in order to prevent screwups from happening if some
other thread/process modifies the memory map during the core
dump. Thus the very last process to be killed gets the dump (as
by then the memory map use count is down to 1).

I'm note sure what the right approach to fix this is. My inclination
would be to stop all the other processes (those which share the mmap),
and generate a threaded core dump. Then mark all these processes as
not dumpable, and then continue them. This is somewhat broken,
but I don't know what the right approach is.

Any thoughts?

Philip

-- 
Philip Gladstone                           +1 617 487 7700
Raptor Systems, Waltham, MA         http://www.raptor.com/
Our new daughter: http://www.mwmc.com/Extweb/Cybernursery/17423662.htm