Re: BUG: unable to handle kernel paging request in do_futex

From: Dmitry Vyukov
Date: Tue Dec 19 2017 - 07:13:10 EST


On Thu, Dec 14, 2017 at 6:02 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Thu, 14 Dec 2017, Andrey Ryabinin wrote:
>> On 12/14/2017 06:31 PM, Thomas Gleixner wrote:
>> > On Thu, 30 Nov 2017, syzbot wrote:
>> >> BUG: unable to handle kernel paging request at 00000000c314149f
>> >
>> > That's a user space address which is nowhere in the registers. Is that
>> > perhaps pre commit: 328b4ed93b69a ?
>>
>> Seems so. Kernel version is 4.15.0-rc1-next-20171130+ it shouldn't have that commit.
>>
>> >> IP: arch_futex_atomic_op_inuser arch/x86/include/asm/futex.h:67 [inline]
>> >> IP: futex_atomic_op_inuser kernel/futex.c:1588 [inline]
>> >> IP: futex_wake_op kernel/futex.c:1637 [inline]
>> >> IP: do_futex+0x14c8/0x2280 kernel/futex.c:3483
>> >> PGD 5e28067 P4D 5e28067 PUD 5e2a067 PMD 0
>> >> Oops: 0002 [#1] SMP KASAN
>> >
>> > ^^^^ X86_PF_WRITE
>> >
>> >> Dumping ftrace buffer:
>> >> (ftrace buffer empty)
>> >> Modules linked in:
>> >> CPU: 0 PID: 14626 Comm: syz-executor6 Not tainted 4.15.0-rc1-next-20171130+
>> >> #56
>> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
>> >> 01/01/2011
>> >> task: 000000005f17dad6 task.stack: 000000005af7607c
>> >> RIP: 0010:arch_futex_atomic_op_inuser arch/x86/include/asm/futex.h:67 [inline]
>> >> RIP: 0010:futex_atomic_op_inuser kernel/futex.c:1588 [inline]
>> >> RIP: 0010:futex_wake_op kernel/futex.c:1637 [inline]
>> >> RIP: 0010:do_futex+0x14c8/0x2280 kernel/futex.c:3483
>> >> RSP: 0018:ffff8801cffafa18 EFLAGS: 00010246
>> >> RAX: 000000007fffffff RBX: 0000000040000002 RCX: ffffffff8164e3d9
>> >> RDX: 0000000000000000 RSI: ffffc900034e8000 RDI: 0000000000000000
>> >> RBP: ffff8801cffafe38 R08: 1ffffffff0d31367 R09: 0000000000000004
>> >> R10: 0000000000000000 R11: ffffffff8748cd60 R12: ffff8801d0f30180
>> >> R13: 0000000020000000 R14: dffffc0000000000 R15: ffff8801cffafe10
>> >> FS: 00007f66305e0700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
>> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> CR2: fffffffffffffff8 CR3: 00000001ccc2e000 CR4: 00000000001426f0
>> >
>> > ^^^^^^^^^^^^^^^^ is a totally different address so its either
>> > completely bogus or the above address is a hashed pointer because
>> > that printk used to be %p and was changed to %px in 328b4ed93b69a
>> >
>> >> DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
>> >> DR3: 0000000000000000 DR6: 00000000ffff0ff3 DR7: 0000000000bb060a
>> >> Call Trace:
>> >> SYSC_futex kernel/futex.c:3533 [inline]
>> >> SyS_futex+0x260/0x390 kernel/futex.c:3501
>> >> entry_SYSCALL_64_fastpath+0x1f/0x96
>> >> RIP: 0033:0x4529d9
>> >> RSP: 002b:00007f66305dfc58 EFLAGS: 00000212 ORIG_RAX: 00000000000000ca
>> >> RAX: ffffffffffffffda RBX: 00007f66305e0700 RCX: 00000000004529d9
>> >> RDX: 0000000000000007 RSI: 0000000000000085 RDI: 0000000020062000
>> >> RBP: 0000000000000000 R08: 0000000020000000 R09: 0000000040000002
>> >> R10: 000000002085fff0 R11: 0000000000000212 R12: 0000000000000000
>> >> R13: 0000000000a6f7ff R14: 00007f66305e09c0 R15: 0000000000000000
>> >
>> > The arguments are:
>> >
>> > RDI uaddr 0000000020062000
>> > RSI op 0000000000000085
>> > RDX val 0000000000000007
>> > RCX utime 00000000004529d9
>> > R8 uaddr2 0000000020000000
>> > R9 val2 0000000040000002
>> >
>> >> Code: 31 d2 0f 1f 00 45 87 65 00 0f 1f 00 89 95 30 fc ff ff e9 1d ff ff ff e8
>> >> 67 56 0b 00 31 d2 8b bd 00 fc ff ff 0f 1f 00 41 8b 45 00 <89> c1 31 f9 f0 41
>> >> 0f b1 4d 00 75 f0 0f 1f 00 41 89 c4 89 95 30
>> >
>> > and the code is:
>> >
>> > 27: 41 8b 45 00 mov 0x0(%r13),%eax
>> > 2b:* 89 c1 mov %eax,%ecx <-- trapping instruction
>> > 2d: 31 f9 xor %edi,%ecx
>> > 2f: f0 41 0f b1 4d 00 lock cmpxchg %ecx,0x0(%r13)
>> > 35: 75 f0 jne 0x27
>> >
>> > The trapping instruction cannot trap :). Assumed it's the move before that,
>> > then the accessed location is R13 + 0 = 0000000020000000, which is uaddr2
>> > and entirely correct.
>> >
>> But fault address must be 0xfffffffffffffff8 as per CR2, so it can't be
>> 'mov 0x0(%r13),%eax' either. Right?
>
> Indeed
>
>> > And what I completely fail to understand why this triggers at all. That
>> > code section is guarded by an extable fixup so this should never come in.
>> >
>> > Is this a KASAN artifact?
>> >
>> I don't see any evidence for KASAN being involved here.
>
> I was just asking because of:
>
>>> Oops: 0002 [#1] SMP KASAN

#syz dup: BUG: unable to handle kernel paging request in __switch_to