Re: Futex queue_me/get_user ordering

From: Joe Seigh
Date: Sun Nov 28 2004 - 13:11:45 EST




Jamie Lokier wrote:
>
> I've looked at the problem of lost-wakeups problem with NPTL condition
> variables and 2.6 futex, with the help of Jakub's finely presented
> pseudo-code. Unless I've made a mistake, it is fixable in userspace.
>
> [ It might be more efficient to fix it in kernel space - on the other
> hand, doing so might also make kernel futexes slower. In general, I
> prefer if the kernel futex semantics can be as "loose" as possible
> to minimise the locking they are absolutely required to do. Who
> knows, we might come up with an algorithm that uses even less
> cross-CPU traffic in the kernel, if the semantics permit it.
> However, I appreciate that a more "atomic" kernel semantic is easier
> to understand, and it is possible to implement that if it is really
> worth doing. I would like to see benchmarks proving it doesn't slow
> down normal futex stress tests though. It might not be slower at all. ]

[...]
> 5. Like 4, but in the kernel. We change the kernel to _always_
> retransmit a wakeup if it's received by the unqueue_me() in the
> word-didn't-match branch.
>
> Effect: In the "Drowsy" state, a waiter may accept a WAKE token
> but then it will offer it again so they are never lost from
> "Sleeping" states.
>
> NOTE: This is NOT equivalent to changing the kernel to do
> test-and-queue atomically. With this change, a FUTEX_WAKE
> operation can return to userspace _before_ the final
> destination of the WAKE token decides to begin FUTEX_WAIT.
>
> This will result in spurious extra wakeups, erring too far the
> other way, because of the difference from atomicity described
> in the preceding paragraph.
>
> Therefore, I don't like this. It would fix the NPTL condition
> variables, but introduces two new problems:
>
> - It violates conservation of WAKE tokens (like energy and
> momentum), which some other futex-using code may depend
> on - unless the return value from FUTEX_WAIT is changed
> to report 1 when it receives a token or 2 when it
> forwards it successfully.
>
> - Some spurious wakeups at times when a wakeup is not
> required.
>
> - No logical benefit over doing it in userspace, but
> would take away flexibility if kernel always did it.
>

I think this is similar to a solution that I proposed elsewhere. You wake up
some other thread, if any, waiting on the futex. This breaks what you call
WAKE tokens but wait morphing with FUTEX_CMP_REQUEUE does that already as far
as I can tell. A FUTEX_WAIT that has been requeued onto another futex could
return EINTR instead of zero (one of the reasons you can't loop on EINTR's in
the cond wait code).

I did an alternate lock-free implementation of pthread condition variables with
a work around of sorts for that futex wake preemption problem I mentioned earlier.
I get a 3x to 200x performance improvement depending on what you are doing. So
naturally I would be interested in a solution that doesn't require a userspace
bottleneck.

Joe Seigh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/