When 3495 finally get's to run and complete it's futex_wake() call, the task
still needs to be woken, so we wake it - but now it's enqueued with a different
futex_q, which now has a valid lock_ptr, so upon wake-up we expect a signal!
OK, I believe this establishes root cause. Now to come up with a fix...
Wow, good work Darren! You definitely have more experience than I do at
tracking down these in-kernel races, and I'm glad we have you looking at
this. I'm snarfing up useful techniques from your progress emails.
So if I understand correctly, there is a race between wake_futex and a
timeout (or a signal, but AFAIK when my python example is running
steady-state there are no signals). The problem occurs when wake_futex
gets preempted half-way through, after it has NULLed lock_ptr, but
before it has woken up the waiter. If a timeout/signal wakes the waiter,
and the waiter runs and starts waiting again before the waker resumes,
then the waker will end up waking the waiter a second time, without the
lock_ptr for the second wait being NULLified. This causes the waiter to
mis-interpret what woke it and leads to the fault we observed.
Now I am wondering if the workaround described here
http://markmail.org/message/at2s337yz3wclaif
for what seems like this same problem isn't actually a legitimate fix.
It ends up looking something like this: (lines 3 and 4 are new)
ret = -ERESTARTSYS;
if (!abs_time) {
if (!signal_pending(current))
set_tsk_thread_flag(current,TIF_SIGPENDING);
goto out_put_key;
}