Re: Dcache oops

From: Al Viro
Date: Fri Jun 03 2016 - 14:22:13 EST


On Fri, Jun 03, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
> I am dropping NFS people since it seems to be converting into a generic VFS/dcache bug even though you need NFS or the like to trigger it - the lookup_open path.

NFS bug is real; there might very well be something else, but that d_drop()
in nfs_atomic_open() needs to be restored.

> [ 2642.364383] BUG: unable to handle kernel paging request at ffff880113f82000
> [ 2642.365014] IP: [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9

*ow*
Could you dump your vmlinux (and System.map) somewhere on anonftp?
This 'bad_gs' is there simply because it's one of the few labels in
.fixup - to say anything useful we'll need to find out where we'd
really come from.

> [ 2642.369810] RIP: 0010:[<ffffffff817f87d4>] [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
> [ 2642.370750] RSP: 0018:ffff8800d7277ba0 EFLAGS: 00010286
> [ 2642.371239] RAX: ffff8800c3a6ff30 RBX: ffff8800c3a6ff00 RCX: ffff880113f82000
> [ 2642.371765] RDX: ffff880113f82000 RSI: 0000000000000000 RDI: 0000000000000002
> [ 2642.372286] RBP: ffff8800d7277be8 R08: 0000000000000000 R09: ffff8800c3a6fed0
> [ 2642.372818] R10: 0000000000000059 R11: ffff8800d6956dd0 R12: ffff880111567ed0
> [ 2642.373415] R13: ffff8800d7277df0 R14: ffff8800c3a6ff50 R15: 0000000084832a57
> [ 2642.373940] FS: 00007fa1814a4700(0000) GS:ffff88011f480000(0000) knlGS:0000000000000000
> [ 2642.374877] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2642.375378] CR2: ffff880113f82000 CR3: 00000000d605c000 CR4: 00000000000006e0
> [ 2642.375881] Stack:
> [ 2642.376295] ffffffff81243a55 0000000000000096 ffff880113f82000 ffffffff00000002
> [ 2642.377204] 000000000000bc04 ffff8800d7277df0 ffff880111567ed0 ffff8800d7277de0
> [ 2642.378113] ffff8800d7277df0 ffff8800d7277c18 ffffffff81243c84 ffffffff81236c4e
> [ 2642.379022] Call Trace:
> [ 2642.379451] [<ffffffff81243a55>] ? __d_lookup+0x5/0x1b0
> [ 2642.379920] [<ffffffff81243c84>] d_lookup+0x84/0xb0
> [ 2642.380388] [<ffffffff81236c4e>] ? lookup_open+0xfe/0x7a0
> [ 2642.380862] [<ffffffff81236c4e>] lookup_open+0xfe/0x7a0
> [ 2642.381374] [<ffffffff81237c3f>] path_openat+0x94f/0xfc0
> [ 2642.381852] [<ffffffff8123935e>] do_filp_open+0x7e/0xe0
> [ 2642.382182] [<ffffffff81233110>] ? lock_rename+0x100/0x100
> [ 2642.382747] [<ffffffff817f4947>] ? _raw_spin_unlock+0x27/0x40
> [ 2642.383324] [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
> [ 2642.383864] [<ffffffff81226896>] do_sys_open+0x116/0x1f0
> [ 2642.384230] [<ffffffff8122698e>] SyS_open+0x1e/0x20
> [ 2642.384569] [<ffffffff817f5136>] entry_SYSCALL_64_fastpath+0x1e/0xad
> [ 2642.398718] Code: e1 03 49 d3 e8 e9 61 ae a4 ff 48 8d 0a 48 83 e1 f8 4c 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 20 af a4 ff 48 8d 0a 48 83 e1 f8 <4c> 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 9b b3 a4 ff b9 f2
> [ 2642.400892] RIP [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
> [ 2642.401417] RSP <ffff8800d7277ba0>
> [ 2642.401856] CR2: ffff880113f82000
>
> Hm, somehow crashdumping support is broken for the newish kernels on my test box, I guess
> I'll try to fix it and then re-reproduce to better understand what's going on here,
> this trace is all I have for now in case anybody has any immediate ideas.

PS: Oleg, fix your MUA, please - long lines in mail are bloody annoying.