Re: fs: GPF in locked_inode_to_wb_and_lock_list
From: Dmitry Vyukov
Date: Thu Apr 21 2016 - 04:35:51 EST
On Wed, Apr 20, 2016 at 11:14 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello, Dmitry.
>
> On Mon, Apr 18, 2016 at 11:44:11AM +0200, Dmitry Vyukov wrote:
>
>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> ...
>> RIP: 0010:[<ffffffff818884d2>] [<ffffffff818884d2>]
>> locked_inode_to_wb_and_lock_list+0xa2/0x750
>> RSP: 0018:ffff88006cdaf7d0 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
>> RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
>> RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
>> R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
>> R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
>> FS: 0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
>> DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> Stack:
>> ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
>> ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
>> ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
>> Call Trace:
>> [< inline >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
>> [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
>> [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
>> [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
>> [< inline >] wb_do_writeback fs/fs-writeback.c:1844
>> [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
>> [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
>> [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
>> [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
>> [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
>> Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
>> 00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
>> 80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
>> RIP [< inline >] wb_get include/linux/backing-dev-defs.h:212
>> RIP [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
>> fs/fs-writeback.c:281
>> RSP <ffff88006cdaf7d0>
>
> Man, that's a beautiful trace w/ decoding of inline functions. When
> did we start doing that? Is there a specific config option for this?
>
>> ---[ end trace 986a4d314dcb2694 ]---
>> The crash happened here:
>>
>> if (wb != &wb->bdi->wb)
>> ffffffff818884cb: 48 89 d8 mov %rbx,%rax
>> ffffffff818884ce: 48 c1 e8 03 shr $0x3,%rax
>> ffffffff818884d2: 42 80 3c 28 00 cmpb $0x0,(%rax,%r13,1)
>
> So, it's the above instruction.
>
>> ffffffff818884d7: 0f 85 17 06 00 00 jne
>> ffffffff81888af4 <locked_inode_to_wb_and_lock_list+0x6c4>
>> ffffffff818884dd: 48 8b 03 mov (%rbx),%rax
>> ffffffff818884e0: 48 83 c0 50 add $0x50,%rax
>> ffffffff818884e4: 48 39 c3 cmp %rax,%rbx
>> ffffffff818884e7: 0f 84 c3 00 00 00 je
>> ffffffff818885b0 <locked_inode_to_wb_and_lock_list+0x180>
>>
>> Which means that bdi is NULL (if I get indirections right).
>
> So, the wb != &wb->bdi->wb comparison would be the cmp at
> 0xffffffff818884e4 and given that it just compares the address of
> &bdi->wb, bdi being NULL wouldn't trigger the fault.
>
> cmpb $0x0,(%rax,%r13,1)
> -> *(u8 *)(%rax + %r13) == 0
> -> *(u8 *)((%rbx >> 3) + %r13) == 0
>
> Where can that be from? I can't find anything matching even in the
> surrounding functions.
>
> Hmmm... The base address %r13 is 0xdffffc0000000000 which isn't a
> proper canonical address and in general suspcious. Ooh, it's
> KASAN_SHADOW_OFFSET. It looks like something is making KASAN trigger
> a fault. Can we please bring in someone who's more familiar with
> KASAN?
I am here.
For every memory access to ADDR, KASAN makes a byte load from
KASAN_SHADOW_OFFSET+ADDR>>8 first.
For accesses to kernel memory, the byte is addressable and contains
addressability state of the ADDR.
It also has a positive side effect of catching any accesses to user
memory at the place of occurrence as the address computations produces
a non-valid address.
But the negative side effect that instead of a usual NULL-deref GPF,
you now get what we have here. %r13 contains KASAN_SHADOW_OFFSET, and
%rax contains the address that normal code is going to dereference few
instructions later. So without KASAN the code would trigger NULL deref
on (%rbx is a copy of %rax):
ffffffff818884dd: 48 8b 03 mov (%rbx),%rax
So whatever load "&wb->bdi->wb" produces is a NULL deref. (is it wb
that is NULL?)
Sorry for this mess. It is a known issue in KASAN, but we don't know
how to fix it without slowing down execution and sacrificing other
properties.