Re: PROBLEM:a bug about pi-futex maybe let the program going to hang

From: Darren Hart
Date: Mon Mar 28 2011 - 18:13:43 EST




On 03/28/2011 01:26 AM, Peter Zijlstra wrote:
On Mon, 2011-03-28 at 15:25 +0800, xby wrote:
hi, all.

Works better if you also CC people who actually work on that code.

Maybe, there is a bug about pi-futex, it would let the program in user-space going to hang.

We have a board: CPU is powerpc 8572, two core. after ran one month, the state of pi-futex in user-space got bad: mutex->__data.__lock is 0x8000023e, mutex->__data.__count is 0, mutex->__data.__owner is 0.

then, I review file "kernel/funtex.c"(the version is linux 2.6.38), found a case:

if there are 3 thread, named threadA, threadB, threadCãthread A hold mutexM, threadB and threadC is waiting mutexM. They run as fllow steps:

1. threadB and threadC sleep at line 1984.
2. threadB receive a signal, then it will be wake up.
3. threadA unlock mutexM, and give mutexM to threadB.
4. threadB call fixup_owner, try to give mutex to threadC.
5. at line 1580, threadB trigger a addr-fault, then goto handle_fault.
6. at line 1617, threadB release spinlock, then handle fault.
7. threadC got spinlock, and call fixup_owner, and got mutexM.
8. threadC give mutexM to threadB.
9. threadB re-got spinlock, it will found "pi_state->owner == oldowner" and retry to fixup.
10. threadB give mutexM to threadC, that's a bad thing.

we have wrote a program, this program can prove all above.

It would have been ever so much more useful if you'd have included that.


Please reply with the testcase and your glibc version please. If this is a custom kernel, please make your .config as well.

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/