a competition when some threads acquire futex

From: chengjian (D)
Date: Mon Sep 04 2017 - 21:51:03 EST


A competition happend when some thread use pthread_mutex(futex in kernel). I make a demo about this : two thread get a lock and then sleep for a few time, finally unlock when waked up.

```cpp
pthread_mutex_lock(&mutex);

//printf("tid = %lu, count = %d\n", pthread_self( ), i);
printf("tid = %lu, count = %d\n", gettid( ), i);
i++;

//sleep(6);
usleep(6000000);
pthread_mutex_unlock(&mutex);
```

What we expect is that these processes are fairly executing and acquiring locks.

The actual phenomenon, however, is that the process which get the lock first (assume A) will acquire the lock immediately after release it. It leads to that another thread B can't access the lock for a long tim, especially when the two thread run on the same CPU.

code follows :

```c
// muext_bug.c

pid_t gettid(void)
{
return syscall(SYS_gettid);
}

pthread_mutex_t mutex ;

void *print_msg(void *arg){

int i = 0;
cpu_set_t mask;

printf("tid = %lu(%lu) START\n", gettid( ), pthread_self( ), i);

CPU_ZERO(&mask);
CPU_SET(0, &mask);

if (pthread_setaffinity_np(pthread_self( ), sizeof(mask), &mask) < 0)
{
fprintf(stderr, "set thread affinity failed\n");
}
else
{
printf("tid = %lu affinity to CPU%d\n", gettid( ), 0);
}

while( 1 )
{
pthread_mutex_lock(&mutex);

//printf("tid = %lu, count = %d\n", pthread_self( ), i);
printf("tid = %lu, count = %d\n", gettid( ), i);
i++;

//sleep(6);
usleep(6000000);
pthread_mutex_unlock(&mutex);
}

}

int main(int argc, char** argv)
{
pthread_t id1;
pthread_t id2;

printf("main pid = %d\n", getpid( ));

pthread_mutex_init(&mutex, NULL);
pthread_create(&id1, NULL, print_msg, NULL);
pthread_create(&id2, NULL, print_msg, NULL);

pthread_join(id1, NULL);
pthread_join(id2, NULL);

pthread_mutex_destroy(&mutex);

return EXIT_SUCCESS;
}
```

result :

```cpp
./mutex_bug
main pid = 17326
tid = 17327(140113713104640) START
tid = 17328(140113704711936) START
tid = 17327 affinity to CPU0
tid = 17327, count = 0
tid = 17328 affinity to CPU0
tid = 17327, count = 1
tid = 17327, count = 2
tid = 17327, count = 3
tid = 17327, count = 4
tid = 17327, count = 5
tid = 17327, count = 6
tid = 17327, count = 7
tid = 17327, count = 8
tid = 17327, count = 9
tid = 17327, count = 10

......

tid = 17327, count = 838
^C
```

use perf ftrace to shows the graph of the function calls. We found that the process 17327 auquire the lock quickly after call futex_wake( ), so the process 17328 futex_wait( ) all the time.

We can solve this problem by scheduling once after release the lock. But what i don't understand is, when the process return to the user space from kernel, the scheduler is used to select a new process to run, but it doesn'tt work, what's happended.

Signed-off-by: Cheng Jian <cj.chengjian@xxxxxxxxxx>
Signed-off-by: Li Bin <huawei.libin@xxxxxxxxxx>
---
kernel/futex.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 3d38eaf..0b2d17a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1545,6 +1545,7 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_
spin_unlock(&hb->lock);
wake_up_q(&wake_q);
+ _cond_resched( );
out_put_key:
put_futex_key(&key);
out:
--
1.8.3.1


.