[futex] Regression in 2.6.38 regarding FLAGS_HAS_TIMEOUT

From: Torsten Hilbrich
Date: Wed Apr 13 2011 - 03:04:50 EST


Hello,

I noticed that the behaviour of FUTEX_WAIT changed between 2.6.37 and
2.6.38. The error was initially found in a java program where a
Thread.sleep never returned after resuming from a suspend to ram.
Thread.sleep is implemented using pthread_cond_timedwait which itself
uses futex with the op FUTEX_WAIT.

The error can also be triggered with a simple test program (attached as
test-futex.c) which calls FUTEX_WAIT with a timeout of 200ms in a loop.
While running the test program the machine is suspended using "echo mem
> /sys/power/state".

After resume the futex syscall never returns. The return can be provoked
by sending the process a combination of SIGSTOP and SIGCONT.

The bug didn't occur in 2.6.37.

I found this bug report

https://bugzilla.kernel.org/show_bug.cgi?id=32922

which describes a related problem and presented a patch. This patch
(adding the FLAGS_HAS_TIMEOUT in futex_wait to the restart_block) fixes
the problem for my initial java problem and the test program.

I found the following pull request which probably introduced the
problem: https://lkml.org/lkml/2011/1/6/62

Thanks,

Torsten


#include <errno.h>
#include <linux/futex.h>
#include <stdio.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <unistd.h>

static inline int futex(int *uaddr, int op, int val,
const struct timespec *timeout,
int *uaddr2, int val3)
{
return syscall(SYS_futex, uaddr, op, val,
timeout, uaddr2, val3);
}

static void futex_sleep(int ms)
{
static int round;
struct timespec ts;
int condition = 0;
int rc;
fprintf(stderr, "Before sleep %d\n", ++round);
ts.tv_sec = 0;
ts.tv_nsec = ms * 1000L * 1000L;
rc = futex(&condition, FUTEX_WAIT, condition, &ts, NULL, 0);
fprintf(stderr, "After sleep (error: %s)\n",
(rc < 0 ? strerror(errno) : "none"));
}

int main()
{
while(1) {
futex_sleep(200);
}
return 0;
}