Re: futex_cmpxchg_enabled breakage

From: Thomas Gleixner
Date: Sat Sep 15 2018 - 11:34:31 EST


On Thu, 30 Aug 2018, Rich Felker wrote:
> On Thu, Aug 30, 2018 at 11:19:58AM +0200, Thomas Gleixner wrote:
> > On Wed, 29 Aug 2018, Rich Felker wrote:
> >
> > > I just spent a number of hours helping someone track down a bug that
> > > looks like it's some kind of futex_cmpxchg_enabled detection error on
> > > powerpc64 (still not sure of the root cause; set_robust_list producing
> > > -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> > > of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> > > (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> > > if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> >
> > Errm? This does a futex_cmpxchg() on NULL which has to return EFAULT if
> > it's available. There is nothing fundamentally buggy about it at all.
>
> I'll let you know when if/when we finish figuring out how this
> happened on powerpc64, but it's an arch that most certainly has a
> working cmpxchg where these syscalls ended up returning -ENOSYS. This
> is an error condition that should not be able to happen. At the very
> very least all the archs that actually have a working cmpxchg
> unconditionally should be updated with:
>
> select HAVE_FUTEX_CMPXCHG if FUTEX
>
> so that whatever caused this for powerpc64 doesn't happen again.

By doing that you paper over a non functional fixup which could cause other
really hard to decode runtime failures. If that fails there is a bug
somewhere else and that runtime check is not to blame at all for it.

> > > Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> > > and page 0 is mapped (a bad idea, but maybe someone does it...). And
> > > of course other nommu archs are possibly still broken.
> >
> > If NULL is mapped in the kernel then a lot of other things are broken. The
> > futex thing is then the least of your worries.
>
> For nommu NULL is always "mapped".

Cool, so you can have a NULL pointer dereference without noticing it.

> > The availibility of the interfaces which depend on futex_cmpxchg_enabled
> > has been runtime detectable forever and it's documented that way. I have no
> > idea why you think it's non-optional. If you made it unconditional in your
> > lib, then it's hardly the kernels problem.
>
> Modern glibc also defines:
>
> #define __ASSUME_SET_ROBUST_LIST 1
>
> It's then #undef'd for some archs (mips, m68k, sparc, arm), but not
> all archs that lack HAVE_FUTEX_CMPXCHG, so glibc seems to be assuming
> set_robust_list actually works on some where the kernel is treating
> failure as a possibility.

That's hardly a kernel issue, right?

> > > If there are no archs that support SMP but don't provide their own
> > > asm/futex.h (as opposed to the asm-generic one that does -ENOSYS on
> > > SMP), the detection code should just be removed, and the SMP case in
> > > asm-generic/futex.h should be made into #error.
> >
> > And why so? Just because?
>
> To prevent inadvertent introduction of this issue on new archs (if the
> porters don't realize asm-generic/futex.h doesn't actually work for
> SMP).
>
> > > If there are archs that support SMP but don't provide their own
> > > working asm/futex.h, then asm-generic/futex.h's SMP case should be
> > > enhanced to perform a stop-the-world IPI and then do the same thing as
> > > the non-SMP case (disable preemption[/interrupts?], perform the
> > > cmpxchg non-atomically).
> > >
> > > Thoughts? Would a patch to do this be acceptable?
> >
> > No. There is nothing at all in the world which requires that PI futexes and
> > robust list are provided and even if you implement that hack in the kernel
> > then the user space side still does not work because the user space part of
> > those interfaces has a hard dependency on working cmpxchg as well.
>
> Of course there's a working cmpxchg; this is a hard requirement for
> userspace. You cannot implement pthread primitives without one. On
> archs/ISA-levels that lack an insn it's already provided as a syscall,
> a vdso/khelper, or trap-and-emulate. The problem I'm complaining about
> here is that, despite already needing and having working ways to
> achieve this, the kernel is not using them, and breaking functionality
> that should work.

I kinda agree for the nommu case, but for power64 you are barking up the
wrong tree. If that check fails something very fundamentaly is broken and
that breakage is _NOT_ in the futex code. This has to work independent of
the futex code, really.

Thanks,

tglx