Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede

From: Thomas Gleixner
Date: Wed Jan 30 2019 - 18:14:02 EST


On Wed, 30 Jan 2019, Sebastian Sewior wrote:

> On 2019-01-30 18:56:54 [+0100], Thomas Gleixner wrote:
> > TBH, no clue. Below are some more traceprintks which hopefully shed some
> > light on that mystery. See kernel/futex.c line 30 ...
>
> The robust list it somehow buggy. In the last trace we had the
> handle_futex_death() of uaddr 3ff9e880140 as the last action. That means
> it was an entry in 56496's ->list_op_pending entry. This makes sense
> because it tried to acquire the lock, failed, got killed.

The robust list of the failing task seems to be correct.

> According to uaddr pid 56956 is the owner. So 56956 invoked one of
> pthread_mutex_lock() / pthread_mutex_timedlock() /
> pthread_mutex_trylock() and should have obtained the lock in userland.
> Depending on where it got killed, that mutex should be either recorded in
> ->list_op_pending or the robust_list (or both if it didn't clear
> ->list_op_pending yet). But it is not.
> Similar for pthread_mutex_unlock().

> We don't have a trace_point if we abort processing the list.

The only reason why it would abort is due a page fault because that cannot
be handled in the exit code anymore.

> On the other hand, it didn't trigger on x86 for hours. Could the atomic

s/hours/days/ ....

> ops be the culprit?

The glibc code does:

THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending,
(void *) (((uintptr_t) &mutex->__data.__list.__next)
| 1));

....
lock in user space

or

lock in kernel space

ENQUEUE_MUTEX_PI (mutex);
THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);

ENQUEUE_MUTEX_PI() resolves to a THREAD_GETMEM() which reads the
list head from TLS, some list manipulation operations and the final
THREAD_SETMEM() which stores the new list head

Now on x86 THREAD_GETMEM() and THREAD_SETMEM() are resolving to

asm volatile ("movX .....")

on s390 they are

descr->member

based operations.

Now the important part of the robust list is the store sequence, i.e. the
list head and final update to the TLS visible part need to come _before_
list_op_pending is cleared.

I might be missing something, but there is no compiler barrier in that code
which would prevent the compiler from reordering the stores. It can
rightfully do so because there is no compiler visible dependency of these
two operations.

On x8664 the asm volatile might prevent it by chance, but it does not have
a 'memory' specified which would guarantee a compiler barrier.

On s390 there is certainly nothing.

So assumed that clearing list_op_pending comes before the list head update,
then the robust exit code in the kernel will fail to see either of
them. FAIL.

I might be wrong as usual, but this would definitely explain the fail very
well.

Thanks,

tglx