Re: pthread_mutex_lock hangs on unlocked mutex

From: Michael Dressel
Date: Thu Nov 13 2008 - 03:04:48 EST


On Thu, 13 Nov 2008, Ian Kent wrote:

On Fri, 7 Nov 2008, Michael Dressel wrote:

Hi,

(I'm not subscribed to the list, please CC me.)

in our software three processes are using several pthread mutexes.
Sometimes a process hangs inside pthread_mutex_lock even though the mutex is
not locked. I can tell it's not locked because another process is still
running and locking and unlocking the mutex.

pthreads is implemented in glibc.
If you really think there is a bug in the ptheads implementation then
the glibc maintainers will require you to produce a simple example program
which demonstrates the bug before it's accepted as a bug.

Yes.
I have not found any report related exactly to my problem in the
mailing lists or bug reports. But to be sure I didn't overlook something
I posted my problem. It looks like it's unique to me.

I failed to produce a simple example demonstrating the problem. In our
code we use timers and real time signals and we change process masks
with sigprocmask. If there is a bug at all (I don't think so) a
program to demonstrate that bug would potentially have to do all of
these things and would therefore not be simple.

Following Bart Van Assche's suggestion.
I did use valgrind (the default tool and helgrind) but I did not find
anything obviously related to my problem.


When you say processes you mean threads, right?


No. We don't use threads. The mutexes are used between processes. I used
them because they feature recursion.

If you can't produce such an example program and you can you prove (to
yourself) there are no use after free or execution order issues with your
code then your only option is to develop a workaround.


I found a workaround. We use normal semaphores now. This is possible
because we don't use multiple threads. In order to provide recursion I
had to implement a per process counter. This would not work if the
semaphore was required during signal handler execution. But this dose not
happen in our application.

You code wouldn't happen to be doing thread synchronization with
pthread_cond_wait()/pthread_cond_signal() would it?

No since we don't use multiple threads.

The reason why I wondered it is an issue (maybe configuration) of the
kernel was that sending a STOP CONT signal sequence to the hanging
process got it going again. So at least it is not a classical dead lock.

Cheers,
Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/