Re: futex: race in lock and unlock&exit for robust futex with PI?

From: Michal Hocko
Date: Mon Jun 28 2010 - 10:43:06 EST


Hi Darren,

On Fri 25-06-10 16:35:14, Darren Hart wrote:
[...]
> # trace-cmd record -p nop ./runSimple.sh
> <snip>
>
> # ps -eLo pid,comm,wchan | grep "simple "
> 20636 simple pause
> 20876 simple pause
>
> # trace-cmd report
> version = 6
> CPU 0 is empty
> cpus=4
> field->offset = 24 size=8
> <...>-20636 [003] 1778.965860: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
> <...>-20636 [003] 1778.965865: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-20636 [003] 1778.965866: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> >>---> <...>-20636 [003] 1778.965867: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-20876 [001] 1780.199394: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
> <...>-20876 [001] 1780.199400: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
> <...>-20876 [001] 1780.199401: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-20876 [001] 1780.199402: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> >>---> <...>-20876 [001] 1780.199403: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-21316 [002] 1782.300695: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
> <...>-21316 [002] 1782.300698: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
>
[...]

I have updated the test case slightly (reduced the number of lock/unlock
cycles to 1).

Then, I have used the additional patch (see bellow) on top of the one
you have posted and here is the log I am getting:

version = 6
cpus=2
field->offset = 16 size=4
<...>-13232 [001] 226.693880: bprint: do_futex : futex_lock_pi start
<...>-13232 [001] 226.693886: bprint: do_futex : futex_lock_pi done ret=0
<...>-13235 [001] 226.700204: bprint: do_futex : futex_lock_pi start
<...>-13235 [001] 226.700210: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
<...>-13235 [001] 226.700211: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-13235 [001] 226.700211: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
<...>-13235 [001] 226.700212: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-13235 [001] 226.700212: bprint: do_futex : futex_lock_pi done ret=-3
<...>-13240 [000] 226.705574: bprint: do_futex : futex_lock_pi start
<...>-13240 [000] 226.705580: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
<...>-13240 [000] 226.705581: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-13240 [000] 226.705582: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
<...>-13240 [000] 226.705582: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-13240 [000] 226.705583: bprint: do_futex : futex_lock_pi done ret=-3
<...>-13231 [000] 226.708095: bprint: do_futex : futex_lock_pi start
<...>-13231 [000] 226.708101: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
<...>-13231 [000] 226.708102: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-13231 [000] 226.708102: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
<...>-13231 [000] 226.708103: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-13231 [000] 226.708103: bprint: do_futex : futex_lock_pi done ret=-3
<...>-13242 [001] 226.709246: bprint: do_futex : futex_unlock_pi start
<...>-13242 [001] 226.709249: bprint: do_futex : futex_unlock_pi: TID->0 transition 2147496890
<...>-13242 [001] 226.709250: bprint: do_futex : futex_unlock_pi: no waiters, unlock the futex ret=0 uval=-2147470406
<...>-13242 [001] 226.709250: bprint: do_futex : futex_unlock_pi done ret=0

As you can see lookup_pi_state fails for the pid (13242) which is at the very
bottom and that is unlocking the futex. This smells fishy to me. I can
see this pattern consistently for all failures. Maybe I am doing
something wrong or the timestamps are not precise enough but from what I
can see this looks like a bug in lookup_pi_state which doesn't find an
existing PID.

--