Re: kernel BUG at kernel/futex.c:679 on v4.13-rc3-ish on arm64
From: Mark Rutland
Date: Tue Aug 08 2017 - 11:43:10 EST
On Tue, Aug 08, 2017 at 04:32:30PM +0100, Mel Gorman wrote:
> On Tue, Aug 08, 2017 at 11:52:05AM +0100, Mark Rutland wrote:
> > As a heads-up, I hit the below splat when using Syzkaller to fuzz arm64
> > VMAP_STACK patches [1] atop of v4.13-rc3. I haven't hit anything else
> > major, and so far I haven't had any luck reproducing this, so it may be
> > an existing issue that's difficult to hit.
> > kernel BUG at kernel/futex.c:679!
>
> This corresponds to the warning
>
> /*
> * Take a reference unless it is about to be freed. Previously
> * this reference was taken by ihold under the page lock
> * pinning the inode in place so i_lock was unnecessary. The
> * only way for this check to fail is if the inode was
> * truncated in parallel so warn for now if this happens.
> *
> * We are not calling into get_futex_key_refs() in file-backed
> * cases, therefore a successful atomic_inc return below will
> * guarantee that get_futex_key() will still imply smp_mb(); (B).
> */
> if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
> rcu_read_unlock();
> put_page(page);
>
> goto again;
> }
>
> The comment is pretty self-explanatory. The only situation I could think
> of where it could happen is if a futex existed on a shared mapping that
> was truncated during the operation. Why would an application truncate a
> mapping with a key on it? As weird as it is, the situation is recoverable
> which is what the code does but the warning was included in case I was
> not imaginative enough.
>
> Can you tell me if it's possible that syskaller when fuzz testing was
> creating a shared mapping, creating a futex backed by the mapping and
> truncating it? If so and that's what triggers the warning then I think it
> would be reasonable to remove the warning as the source of the confusion
> is userspace truncating a mapping with active keys on it.
I think that's exactly what Syzkaller is doing.
Near the end of the log, the following are run (concurrently):
mmap(&(0x7f0000bc6000/0x1000)=nil, (0x1000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
mmap(&(0x7f0000bc6000/0x1000)=nil, (0x1000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
mmap(&(0x7f0000bc6000/0x1000)=nil, (0x1000), 0x3, 0x31, 0xffffffffffffffff, 0x0)
futex(&(0x7f0000bc6000)=0x71, 0xb, 0x0, &(0x7f0000bc6000)={0x0, 0x989680}, &(0x7f0000bc6000+0x26c)=0x9ba4, 0x1)
> If you manage to create a test case, then it would be nice to test without
> that warning and see if it completes successfully or if there is other
> fallout.
I *just* sent a test case which blats the mapping with a new anonymous
mmap:
20170808145732.GD19207@leverpostej">https://lkml.kernel.org/r/20170808145732.GD19207@leverpostej
With my __BUG_FLAGS() issue corrected, the WARN_ON_ONCE() fires once,
and everything else seems fine. I'll have a go with additional debug
enabled just in case.
Thanks,
Mark