Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede

From: Paul E. McKenney
Date: Wed Jan 30 2019 - 18:36:07 EST


On Thu, Jan 31, 2019 at 12:13:51AM +0100, Thomas Gleixner wrote:
> On Wed, 30 Jan 2019, Sebastian Sewior wrote:
>
> > On 2019-01-30 18:56:54 [+0100], Thomas Gleixner wrote:
> > > TBH, no clue. Below are some more traceprintks which hopefully shed some
> > > light on that mystery. See kernel/futex.c line 30 ...
> >
> > The robust list it somehow buggy. In the last trace we had the
> > handle_futex_death() of uaddr 3ff9e880140 as the last action. That means
> > it was an entry in 56496's ->list_op_pending entry. This makes sense
> > because it tried to acquire the lock, failed, got killed.
>
> The robust list of the failing task seems to be correct.
>
> > According to uaddr pid 56956 is the owner. So 56956 invoked one of
> > pthread_mutex_lock() / pthread_mutex_timedlock() /
> > pthread_mutex_trylock() and should have obtained the lock in userland.
> > Depending on where it got killed, that mutex should be either recorded in
> > ->list_op_pending or the robust_list (or both if it didn't clear
> > ->list_op_pending yet). But it is not.
> > Similar for pthread_mutex_unlock().
>
> > We don't have a trace_point if we abort processing the list.
>
> The only reason why it would abort is due a page fault because that cannot
> be handled in the exit code anymore.
>
> > On the other hand, it didn't trigger on x86 for hours. Could the atomic
>
> s/hours/days/ ....
>
> > ops be the culprit?
>
> The glibc code does:
>
> THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending,
> (void *) (((uintptr_t) &mutex->__data.__list.__next)
> | 1));
>
> ....
> lock in user space
>
> or
>
> lock in kernel space
>
> ENQUEUE_MUTEX_PI (mutex);
> THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
>
> ENQUEUE_MUTEX_PI() resolves to a THREAD_GETMEM() which reads the
> list head from TLS, some list manipulation operations and the final
> THREAD_SETMEM() which stores the new list head
>
> Now on x86 THREAD_GETMEM() and THREAD_SETMEM() are resolving to
>
> asm volatile ("movX .....")
>
> on s390 they are
>
> descr->member
>
> based operations.
>
> Now the important part of the robust list is the store sequence, i.e. the
> list head and final update to the TLS visible part need to come _before_
> list_op_pending is cleared.
>
> I might be missing something, but there is no compiler barrier in that code
> which would prevent the compiler from reordering the stores. It can
> rightfully do so because there is no compiler visible dependency of these
> two operations.
>
> On x8664 the asm volatile might prevent it by chance, but it does not have
> a 'memory' specified which would guarantee a compiler barrier.
>
> On s390 there is certainly nothing.
>
> So assumed that clearing list_op_pending comes before the list head update,
> then the robust exit code in the kernel will fail to see either of
> them. FAIL.
>
> I might be wrong as usual, but this would definitely explain the fail very
> well.

On recent versions of GCC, the fix would be to put this between the two
stores that need ordering:

__atomic_thread_fence(__ATOMIC_RELEASE);

I must defer to Heiko on whether s390 GCC might tear the stores. My
guess is "probably not". ;-)

Thanx, Paul