Re: [patch -rt] Fix infinite loop with 2.6.31.4-rt14

From: Darren Hart
Date: Fri Oct 23 2009 - 12:21:47 EST


Hey Dino,

Dinakar Guniguntala wrote:

Not sure if this the best way to go here, but the patch below seems to resolve
the problem for me

If this is fine, I'll send a separate patch for mainline. Currently mainline
seems to be missing the earlier patch referenced above as well

So... something else sets ret to -EAGAIN which should be returned to userspace, rather than used to retry.


/**
- * handle_early_requeue_pi_wakeup() - Detect early wakeup on the initial futex
- * @hb: the hash_bucket futex_q was original enqueued on
- * @q: the futex_q woken while waiting to be requeued
- * @key2: the futex_key of the requeue target futex
- * @timeout: the timeout associated with the wait (NULL if none)
- *
- * Detect if the task was woken on the initial futex as opposed to the requeue
- * target futex. If so, determine if it was a timeout or a signal that caused
- * the wakeup and return the appropriate error code to the caller. Must be
- * called with the hb lock held.
- *
- * Returns
- * 0 - no early wakeup detected
- * <0 - -ETIMEDOUT or -ERESTARTNOINTR
- */
-static inline
-int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
- struct futex_q *q, union futex_key *key2,
- struct hrtimer_sleeper *timeout)
-{

Cramming this entire function into futex_wait_requeue_pi() doesn't appear relevant to the problem. Please don't make futex_wait_requeue_pi() any longer than it already is. My fault that, but let's not make it worse!

spin_unlock(&hb->lock);
+ if (ret == -EAGAIN) {
+ /* Retry on spurious wakeup */
+ put_futex_key(fshared, &q.key);
+ put_futex_key(fshared, &key2);
+ goto retry;
+ }

The above is the core of the change yes?

if (ret)
goto out_put_keys;

@@ -2264,9 +2247,6 @@ out_put_keys:
out_key2:
put_futex_key(fshared, &key2);

- /* Spurious wakeup ? */
- if (ret == -EAGAIN)
- goto retry;

I did a little digging trying to see what else might be returning EAGAIN after the handle_early..() call. I only see fixup_pi_state_owner() and rt_mutex_finish_proxy_lock(). Neither of those has an obvious return of EAGAIN. I'll keep looking.

If this turns out to be the proper fix, please reduce it down to simply check for EAGAIN after handle_early..() and return from there, with a comment, and avoid moving the logic into this overly large function.

Thanks,

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/