Re: ERESTARTSYS escaping from sem_wait with RTLinux patch

From: Darren Hart
Date: Tue Oct 13 2009 - 11:16:19 EST

Next message: Bjorn Helgaas: "Re: [PATCH] pci: increase alignment to make more space for hidden code"
Previous message: David Teigland: "Re: linux-next: unable to fetch dlm tree"
In reply to: Blaise Gassend: "Re: ERESTARTSYS escaping from sem_wait with RTLinux patch"
Next in thread: Thomas Gleixner: "Re: ERESTARTSYS escaping from sem_wait with RTLinux patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Blaise Gassend wrote:

When 3495 finally get's to run and complete it's futex_wake() call, the task
still needs to be woken, so we wake it - but now it's enqueued with a different
futex_q, which now has a valid lock_ptr, so upon wake-up we expect a signal!

OK, I believe this establishes root cause. Now to come up with a fix...

Wow, good work Darren! You definitely have more experience than I do at
tracking down these in-kernel races, and I'm glad we have you looking at
this. I'm snarfing up useful techniques from your progress emails.

Great, I learn a lot from reading other people's status-type email as well. Glad I can be on the contributing end once and a while :)

So if I understand correctly, there is a race between wake_futex and a
timeout (or a signal, but AFAIK when my python example is running
steady-state there are no signals). The problem occurs when wake_futex
gets preempted half-way through, after it has NULLed lock_ptr, but
before it has woken up the waiter. If a timeout/signal wakes the waiter,
and the waiter runs and starts waiting again before the waker resumes,
then the waker will end up waking the waiter a second time, without the
lock_ptr for the second wait being NULLified. This causes the waiter to
mis-interpret what woke it and leads to the fault we observed.

Now I am wondering if the workaround described here
http://markmail.org/message/at2s337yz3wclaif
for what seems like this same problem isn't actually a legitimate fix.
It ends up looking something like this: (lines 3 and 4 are new)

ret = -ERESTARTSYS;
if (!abs_time) {
if (!signal_pending(current))
set_tsk_thread_flag(current,TIF_SIGPENDING);
goto out_put_key;
}

The trouble with this is it is a bandaid to a fundamentally broken wake-up path. I tried flagging the waiters on the wake-list as already woken and then skipping them in the wake_futex_list(), but this got ugly really fast.

Talking with Thomas a bit more we're not sure the patch that introduced this lockless waking actually does any good, as the normal wakeup path doesn't take the hb->lock anyway, it's more likely the contention was due to an app like this that wakes a task and almost immediately puts it back to sleep on a futex before the waker has a chance to drop the hb->lock.

The futex wake-up path is complicated enough as it is, in my personal opinion, we are better off dropping the "lockless wake-up" patch and removing the race and simplifying the wake-up path at the same time.

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Bjorn Helgaas: "Re: [PATCH] pci: increase alignment to make more space for hidden code"
Previous message: David Teigland: "Re: linux-next: unable to fetch dlm tree"
In reply to: Blaise Gassend: "Re: ERESTARTSYS escaping from sem_wait with RTLinux patch"
Next in thread: Thomas Gleixner: "Re: ERESTARTSYS escaping from sem_wait with RTLinux patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]