Re: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op

From: Darren Hart
Date: Thu Jun 23 2016 - 00:48:36 EST


On Mon, Jun 20, 2016 at 04:26:52PM +0200, Matthieu CASTET wrote:
> Hi,
>
> the commit 337f13046ff03717a9e99675284a817527440a49 is saying that it
> change to syscall to an equivalent to FUTEX_WAIT_BITSET |
> FUTEX_CLOCK_REALTIME with a bitset of FUTEX_BITSET_MATCH_ANY.
>
> It seems wrong to me, because in case of FUTEX_WAIT, in
> "SYSCALL_DEFINE6(futex", we convert relative timeout to absolute
> timeout [1].
>
> So FUTEX_CLOCK_REALTIME | FUTEX_WAIT is expecting a relative timeout
> when FUTEX_WAIT_BITSET take an absolute timeout.
>
> To make it work you have to use something like the (untested) attached
> patch.

+Eric Dumazet

Thanks for reporting Matthieu,

FUTEX_WAIT traditionally used a relative timeout with CLOCK_MONOTONIC while
FUTEX_WAIT_BITSET could use either ??? based on the FUTEX_CLOCK_ flag used. The
man page is not particularly clear on this:

http://man7.org/linux/man-pages/man2/futex.2.html

"
The FUTEX_WAIT_BITSET operation also interprets the timeout argument
differently from FUTEX_WAIT. See the discussion of FUTEX_CLOCK_REALTIME,
above.
"

Matthew Kerrisk:
I think this language could be removed now that we support the
FUTEX_CLOCK_REALTIME flag for both futex ops.

As for the intended behavior of the FUTEX_CLOCK_REALTIME flag:


FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
This option bit can be employed only with the FUTEX_WAIT_BITSET,
FUTEX_WAIT_REQUEUE_PI, and FUTEX_WAIT (since Linux 4.5) operations.

(NOTE: FUTEX_WAIT was recently added after the patch in question here)

If this option is set, the kernel treats timeout as an absolute time based
on CLOCK_REALTIME.

If this option is not set, the kernel treats timeout as a relative time,
measured against the CLOCK_MONOTONIC clock.


This supports your argument Matthieu. The assumption of a relative timeout for
FUTEX_WAIT in SYSCALL_DEFINE6 needs to be updated to account for FUTEX_WAIT now
honoring the FUTEX_CLOCK_REALTIME flag, which treats the timeout as absolute.

However, I don't think the patch below is correct. The existing logic
determines the type of timeout based on the futex_op when it should instead
determine the type of timeout based on the FUTEX_CLOCK_REALTIME flag.

My reading of the man page is that FUTEX_WAIT_BITSET abides by the timeout
interpretation defined by the FUTEX_CLOCK_REALTIME attribute, so
SYSCALL_DEFINE6 was misbehaving for FUTEX_WAIT|FUTEX_CLOCK_REALTIME (where the
timeout should have been treated as absolute) as well as for
FUTEX_WAIT_BITSET|FUTEX_CLOCK_MONOTONIC (where the timeout should have been
treated as relative).

Consider the following:

diff --git a/kernel/futex.c b/kernel/futex.c
index 33664f7..fa2af29 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3230,7 +3230,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
return -EINVAL;

t = timespec_to_ktime(ts);
- if (cmd == FUTEX_WAIT)
+ if (!(cmd & FUTEX_CLOCK_REALTIME))
t = ktime_add_safe(ktime_get(), t);
tp = &t;
}

The concern for me is whether the code is incorrect, or if the man page is
incorrect. Does existing userspace code expect the FUTEX_WAIT_BITSET op to
always use an absolute timeout, regardless of the CLOCK used?





>
> Matthieu
>
> [1]
> if (cmd == FUTEX_WAIT)
> t = ktime_add_safe(ktime_get(), t);

> diff --git a/kernel/futex.c b/kernel/futex.c
> index 33664f7..4bee915 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -3230,7 +3230,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
> return -EINVAL;
>
> t = timespec_to_ktime(ts);
> - if (cmd == FUTEX_WAIT)
> + if (cmd == FUTEX_WAIT && !(op & FUTEX_CLOCK_REALTIME))
> t = ktime_add_safe(ktime_get(), t);
> tp = &t;
> }


--
Darren Hart
Intel Open Source Technology Center