GPF in __d_lookup_rcu after hibernate

From: Johan Hovold
Date: Sat Mar 19 2016 - 12:49:12 EST


Hi,

After updating to 4.4.5 I keep hitting a GPF in __d_lookup_rcu after
resuming from suspend-to-disk:

[36023.005198] general protection fault: 0000 [#1] PREEMPT SMP
[36023.005304] Modules linked in: intel_rapl iosf_mbi [last unloaded: videobuf2_memops]
[36023.005440] CPU: 1 PID: 2726 Comm: rsync Not tainted 4.4.6 #130
[36023.005535] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 940X3G/NP940X3G-K03SE, BIOS P02ACJ.101.130926.dg 09/26/2013
[36023.005702] task: ffff88020cf39780 ti: ffff8800c6088000 task.ti: ffff8800c6088000
[36023.005820] RIP: 0010:[<ffffffff811539b2>] [<ffffffff811539b2>] __d_lookup_rcu+0x72/0x150
[36023.005960] RSP: 0018:ffff8800c608bc88 EFLAGS: 00010206
[36023.006045] RAX: 0000000000090000 RBX: 000b0000000a0000 RCX: 000000000000000c
[36023.006164] RDX: ffff880216c00000 RSI: ffff8800c608bdb0 RDI: ffff8801f2be4d80
[36023.006271] RBP: ffff8801f2be4d80 R08: 0000000000000040 R09: ffff8800da9f6025
[36023.006378] R10: 0000000000000005 R11: ffffffffffffffff R12: 000b00000009fff8
[36023.006485] R13: 0000000519aa5aaf R14: ffff8800c608bdb0 R15: ffff8800c608bcec
[36023.006592] FS: 00007f1457ad7700(0000) GS:ffff88021fa80000(0000) knlGS:0000000000000000
[36023.006714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[36023.006800] CR2: 0000000002f98ff0 CR3: 00000001ce66e000 CR4: 00000000001406e0
[36023.006907] Stack:
[36023.006937] 00000002b9c5bf00 ffff8800c608bda0 0000000000000000 ffff8800c608bda0
[36023.007059] 0000000000000000 ffff8800c608bd38 ffff8801f2be4d80 ffff8800c608bd30
[36023.007179] ffff8800d4e00da0 ffffffff8114973d ffff8800c608bd2c ffff8801f2be4d80
[36023.007300] Call Trace:
[36023.007339] [<ffffffff8114973d>] ? lookup_fast+0x3d/0x2d0
[36023.007423] [<ffffffff81149a71>] ? walk_component+0x31/0x2b0
[36023.007511] [<ffffffff811483db>] ? path_init+0x17b/0x3c0
[36023.007593] [<ffffffff8114a2db>] ? path_lookupat+0x5b/0x110
[36023.007678] [<ffffffff8114bce3>] ? filename_lookup+0x93/0x110
[36023.007769] [<ffffffff8112b46e>] ? page_add_new_anon_rmap+0x3e/0x80
[36023.007865] [<ffffffff8114b9e4>] ? getname_flags+0x44/0x180
[36023.007952] [<ffffffff81142fd4>] ? vfs_fstatat+0x44/0x90
[36023.008035] [<ffffffff81143560>] ? SyS_newlstat+0x10/0x30
[36023.008118] [<ffffffff81001039>] ? syscall_trace_enter_phase1+0xb9/0x110
[36023.008222] [<ffffffff810a4ae3>] ? vtime_user_enter+0x23/0x40
[36023.008312] [<ffffffff81102345>] ? __context_tracking_enter+0x45/0x90
[36023.008413] [<ffffffff817a0b57>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[36023.008512] Code: 8b 18 48 83 e3 fe 0f 84 96 00 00 00 4c 89 e8 49 c7 c3 ff ff ff ff 48 c1 e8 20 49 89 c2 eb 08 48 8b 1b 48 85 db 74 7b 4c 8d 63 f8 <8b> 43 fc 48 39 6b 10 75 eb 48 83 7b 08 00 74 e4 83 e0 fe f6 45
[36023.008940] RIP [<ffffffff811539b2>] __d_lookup_rcu+0x72/0x150
[36023.009034] RSP <ffff8800c608bc88>
[36023.036540] ---[ end trace 0f7289662a99e06b ]---

4.4.6 has the same problem as can be seen above, and I just discovered I
had saved a log from 4.3.4 which also appears to suffer from this:

[154467.785854] general protection fault: 0000 [#1] PREEMPT SMP
[154467.785962] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core intel_rapl iosf_mbi [last unloaded: videobuf2_memops]
[154467.786191] CPU: 0 PID: 3944 Comm: rsync Not tainted 4.3.4 #126
[154467.786287] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 940X3G/NP940X3G-K03SE, BIOS P02ACJ.101.130926.dg 09/26/2013
[154467.786456] task: ffff88018a5d8c00 ti: ffff8801f959c000 task.ti: ffff8801f959c000
[154467.786583] RIP: 0010:[<ffffffff8114f7d2>] [<ffffffff8114f7d2>] __d_lookup_rcu+0x72/0x150
[154467.786709] RSP: 0018:ffff8801f959fc88 EFLAGS: 00010206
[154467.786786] RAX: 0000000000990000 RBX: 009b0000009a0000 RCX: 000000000000000c
[154467.786890] RDX: ffff880216c00000 RSI: ffff8801f959fdb0 RDI: ffff8801fa214480
[154467.786993] RBP: ffff8801fa214480 R08: ffff880192cee033 R09: ffff880192cee039
[154467.787096] R10: 0000000000000021 R11: ffffffffffffffff R12: 009b00000099fff8
q[154467.787200] R13: 00000021d9ccb96d R14: ffff8801f959fdb0 R15: ffff8801f959fcec
[154467.787303] FS: 00007f444fd82700(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
[154467.787420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[154467.787503] CR2: 0000000002c00ff8 CR3: 000000006b262000 CR4: 00000000001406f0
[154467.787605] Stack:
[154467.787631] 00000002fa2149c0 ffff8801f959fda0 0000000000000000 ffff8801f959fda0
[154467.787727] 0000000000000000 ffff8801f959fd38 ffff8801fa214480 ffff8801f959fd30
[154467.787824] ffff88021400fe20 ffffffff8114550d ffff8801f959fd2c ffff8801fa214480
[154467.787920] Call Trace:
[154467.787953] [<ffffffff8114550d>] ? lookup_fast+0x3d/0x2d0
[154467.788022] [<ffffffff81145841>] ? walk_component+0x31/0x280
[154467.788093] [<ffffffff81144152>] ? path_init+0x182/0x3c0
[154467.788159] [<ffffffff8114605b>] ? path_lookupat+0x5b/0x110
[154467.788229] [<ffffffff81147a53>] ? filename_lookup+0x93/0x110
[154467.788301] [<ffffffff8119fa3f>] ? call_filldir+0x7f/0x120
[154467.788370] [<ffffffff81147754>] ? getname_flags+0x44/0x180
[154467.788439] [<ffffffff8113ed04>] ? vfs_fstatat+0x44/0x90
[154467.788506] [<ffffffff8113f290>] ? SyS_newlstat+0x10/0x30
[154467.788581] [<ffffffff8100105f>] ? syscall_trace_enter_phase1+0xdf/0x130
[154467.788650] [<ffffffff81789817>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[154467.788716] Code: 8b 18 48 83 e3 fe 0f 84 96 00 00 00 4c 89 e8 49 c7 c3 ff ff ff ff 48 c1 e8 20 49 89 c2 eb 08 48 8b 1b 48 85 db 74 7b 4c 8d 63 f8 <8b> 43 fc 48 39 6b 10 75 eb 48 83 7b 08 00 74 e4 83 e0 fe f6 45
[154467.788998] RIP [<ffffffff8114f7d2>] __d_lookup_rcu+0x72/0x150
[154467.789060] RSP <ffff8801f959fc88>
[154467.807471] ---[ end trace 1f6c02a6b2bb76b1 ]---

When this happens only a forced cold boot appears to make the machine
usable again (further look-ups keep failing).

Looking at these logs now I realised I had reloaded the uvcvideo module so
to be sure I just suspended without touching that module and hit another
GFP when relaunching firefox after resume:

[ 2741.976850] general protection fault: 0000 [#1] PREEMPT SMP
[ 2741.976957] Modules linked in: intel_rapl iosf_mbi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
[ 2741.977159] CPU: 3 PID: 8199 Comm: firefox Not tainted 4.4.6 #130
[ 2741.977256] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 940X3G/NP940X3G-K03SE, BIOS P02ACJ.101.130926.dg 09/26/2013
[ 2741.977425] task: ffff8801edcdaf00 ti: ffff8800d41b0000 task.ti: ffff8800d41b0000
[ 2741.977544] RIP: 0010:[<ffffffff8119d840>] [<ffffffff8119d840>] kernfs_iop_follow_link+0x70/0x1a0
[ 2741.977696] RSP: 0018:ffff8800d41b3e98 EFLAGS: 00010286
[ 2741.977780] RAX: ffff8801edcdaf00 RBX: ffff88021687e4b0 RCX: 0000000000000000
[ 2741.977893] RDX: 008f0000008e0000 RSI: 0000000000000000 RDI: ffffffff81c324e0
[ 2741.978007] RBP: ffff8800d53b6000 R08: ffffffff81ab6c29 R09: ffffea000354edc0
[ 2741.978120] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800d53b6000
[ 2741.978233] R13: 00007ffd3ccaed80 R14: ffff8800da802870 R15: 008f0000008e0000
[ 2741.978347] FS: 00007ff343d65780(0000) GS:ffff88021fb80000(0000) knlGS:0000000000000000
[ 2741.978475] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2741.978573] CR2: 00007ff343d3f070 CR3: 00000000da8be000 CR4: 00000000001406e0
[ 2741.978665] Stack:
[ 2741.978698] ffff8800d41b3ed8 ffff8802148f98c0 00007ff3429314d0 0000000000000063
[ 2741.978804] 00007ffd3ccaed80 ffff8802148f98c0 00007ff3429314d0 ffffffff81147926
[ 2741.978908] 00000000ffffff9c 00000000ffffffea 0000000000004000 00000000ffffff9c
[ 2741.979012] Call Trace:
[ 2741.979047] [<ffffffff81147926>] ? generic_readlink+0x56/0x70
[ 2741.979126] [<ffffffff8114366d>] ? SyS_readlinkat+0x8d/0x100
[ 2741.979203] [<ffffffff817a0b57>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[ 2741.979289] Code: 24 c3 81 4c 89 e5 48 8b 58 08 4c 8b 70 40 e8 38 17 60 00 48 83 7b 08 00 74 35 4d 8b 7e 08 4c 89 fa eb 08 48 39 da 74 2b 48 89 ca <48> 8b 4a 08 48 85 c9 75 ef 48 39 d3 74 1a c7 45 00 2e 2e 2f 00
[ 2741.979650] RIP [<ffffffff8119d840>] kernfs_iop_follow_link+0x70/0x1a0
[ 2741.979723] RSP <ffff8800d41b3e98>
[ 2742.000022] ---[ end trace 264065d347b49d9e ]---

I found the "fs: NULL deref in atime_needs_update" thread

https://lkml.kernel.org/r/20160228170133.GM17997@xxxxxxxxxxxxxxxxxx

after a quick search but can't say if its related.

Any ideas of what might be going on here?

Thanks,
Johan