Re: Futex hang/lockup problem in 2.6.30+ on AMD64
From: Andrew Athan
Date: Thu Jan 28 2010 - 12:47:01 EST
Darren Hart wrote:
Andrew Athan wrote:
Andrew Athan wrote:
AmÃrico Wang wrote:
On Tue, Jan 12, 2010 at 10:55 PM, Peter Zijlstra
<peterz@xxxxxxxxxxxxx> wrote:
On Tue, 2010-01-12 at 22:52 +0800, AmÃrico Wang wrote:
$ uname -a
Linux UK22 2.6.30-2-amd64 #1 SMP Fri Sep 25 22:16:56 UTC 2009
x86_64
GNU/Linux
Does a recent kernel work?
Ah, I just wanted to ask the same question, adding the original
reporter
Gong Cheng into Cc...
Gong, could you reproduce it on the latest kernel? And what is your
.config?
Thanks!
--
To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Due to remote location of the hardware and I haven't been able to
test a more recent (or older) kernel. Remote hands have put a KVM
on the box as of an hour ago, so I hope to have some information for
you in a day or two.
A.
I wanted to report that although I have had no luck (so far) running
anything more recent than 2.6.30, I was able to revert to 2.6.26.
Unfortunately, the application hang still occurs. I also saw a
similar hang of the application running on a 32 bit Intel box, also
under 2.6.26. So far, the hang *always* involves threads stuck on
pthread_cond_broadcast()'s condition variable's internal lock while
other threads are waiting on the outer "public" lock.
Are you using real-time scheduling policy or priority inheritance
(PTHREAD_PRIO_INHERIT)? It is possible to suffer an unbounded priority
inversion on the internal condvar data lock in the current distro
implementations of glibc.
These other threads are *not* yet (nor about to)
pthread_cond_wait(). I saw a message from Darren Hart (subject "Re:
Problems with futex") in response to someone who apparently was
having futex problems in 2.6.27, so I'm still operating under the
assumption that this is not an application bug.
Those all turned out to be application issues with one exception which
had already been fixed upstream.
Over the next couple of days, I will be running a version of the
application in which I replaced the pthread_cond calls with simpler
locks, in the hopes that it won't hang (because I'm hoping the
underlying implementation in pthreads uses a different set of futex
opcodes).
Andrew Athan
I wanted to report that this application hang is certainly related to
pthread_cond_* calls. With them in place, it consistently hangs.
Without, it consistently does not. Whether pthread_cond_* is
misbehaving due to memory corruption or another application bug I
suppose is an open question.
We have now experienced several lockups where even a kill -9 of the
application won't get rid of it. Does this say anything about the
nature of the hang?
By the way, majordomo stopped sending me emails as of 1/17 so I have not
seen any updates to this thread sent after this date. Not sure why this
happened, as I never asked to be unsubscribed. I've resubscribed, but
not sure I will get anything. Please make sure I am directly cc:ed on
any responses.
carlinux138:~# uname -a
Linux carlinux138.thinktradellc.com 2.6.26-2-686 #1 SMP Sun Jun 21
04:57:38 UTC 2009 i686 GNU/Linux
(I have to go look up what the best way to give a system config snapshot
is, e.g., all major library version etc ... )
Thanks,
Andrew Athan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/