Re: d_lookup: Unable to handle kernel paging request

From: Vicente Bergas
Date: Wed May 22 2019 - 11:47:29 EST


Hi Al,

On Wednesday, May 22, 2019 3:53:31 PM CEST, Al Viro wrote:
On Wed, May 22, 2019 at 12:40:55PM +0200, Vicente Bergas wrote:
Hi,
since a recent update the kernel is reporting d_lookup errors.
They appear randomly and after each error the affected file or directory
is no longer accessible.
The kernel is built with GCC 9.1.0 on ARM64.
Four traces from different workloads follow.

Interesting... bisection would be useful.

Agreed, but that would be difficult because of the randomness.
I have been running days with no issues with a known-bad kernel.
The issue could also be related to the upgrade to GCC 9.

This trace is from v5.1-12511-g72cf0b07418a while untaring into a tmpfs
filesystem:

Unable to handle kernel paging request at virtual address 0000880001000018
user pgtable: 4k pages, 48-bit VAs, pgdp = 000000007ccc6c7d
[0000880001000018] pgd=0000000000000000

Attempt to dereference 0x0000880001000018, which is not mapped at all?

pc : __d_lookup+0x58/0x198

... and so would objdump of the function in question.

Here is the dump from another build of the exact
same version (the build is reproducible).

objdump -x -S fs/dcache.o

...
0000000000002d00 <__d_lookup_rcu>:
2d00: a9b97bfd stp x29, x30, [sp, #-112]!
2d04: aa0103e3 mov x3, x1
2d08: 90000004 adrp x4, 0 <find_submount>
2d08: R_AARCH64_ADR_PREL_PG_HI21 .data..read_mostly
2d0c: 910003fd mov x29, sp
2d10: a90153f3 stp x19, x20, [sp, #16]
2d14: 91000081 add x1, x4, #0x0
2d14: R_AARCH64_ADD_ABS_LO12_NC .data..read_mostly
2d18: a9025bf5 stp x21, x22, [sp, #32]
2d1c: a9046bf9 stp x25, x26, [sp, #64]
2d20: a9406476 ldp x22, x25, [x3]
2d24: b9400821 ldr w1, [x1, #8]
2d28: f9400084 ldr x4, [x4]
2d28: R_AARCH64_LDST64_ABS_LO12_NC .data..read_mostly
2d2c: 1ac126c1 lsr w1, w22, w1
2d30: f8617893 ldr x19, [x4, x1, lsl #3]
2d34: f27ffa73 ands x19, x19, #0xfffffffffffffffe
2d38: 54000920 b.eq 2e5c <__d_lookup_rcu+0x15c> // b.none
2d3c: aa0003f5 mov x21, x0
2d40: d360feda lsr x26, x22, #32
2d44: a90363f7 stp x23, x24, [sp, #48]
2d48: aa0203f8 mov x24, x2
2d4c: d3608ad7 ubfx x23, x22, #32, #3
2d50: a90573fb stp x27, x28, [sp, #80]
2d54: 2a1603fc mov w28, w22
2d58: 9280001b mov x27, #0xffffffffffffffff // #-1
2d5c: 14000003 b 2d68 <__d_lookup_rcu+0x68>
2d60: f9400273 ldr x19, [x19]
2d64: b4000793 cbz x19, 2e54 <__d_lookup_rcu+0x154>
2d68: b85fc265 ldur w5, [x19, #-4]
2d6c: d50339bf dmb ishld
2d70: f9400a64 ldr x4, [x19, #16]
2d74: d1002260 sub x0, x19, #0x8
2d78: eb0402bf cmp x21, x4
2d7c: 54ffff21 b.ne 2d60 <__d_lookup_rcu+0x60> // b.any
2d80: 121f78b4 and w20, w5, #0xfffffffe
2d84: aa0003e9 mov x9, x0
2d88: f9400664 ldr x4, [x19, #8]
2d8c: b4fffea4 cbz x4, 2d60 <__d_lookup_rcu+0x60>
2d90: b94002a4 ldr w4, [x21]
2d94: 37080404 tbnz w4, #1, 2e14 <__d_lookup_rcu+0x114>
2d98: f9401000 ldr x0, [x0, #32]
2d9c: eb16001f cmp x0, x22
2da0: 54fffe01 b.ne 2d60 <__d_lookup_rcu+0x60> // b.any
2da4: f9401265 ldr x5, [x19, #32]
2da8: 2a1a03e6 mov w6, w26
2dac: cb050328 sub x8, x25, x5
2db0: 14000006 b 2dc8 <__d_lookup_rcu+0xc8>
2db4: 910020a5 add x5, x5, #0x8
2db8: eb07001f cmp x0, x7
2dbc: 54fffd21 b.ne 2d60 <__d_lookup_rcu+0x60> // b.any
2dc0: 710020c6 subs w6, w6, #0x8
2dc4: 54000160 b.eq 2df0 <__d_lookup_rcu+0xf0> // b.none
2dc8: 8b0800a4 add x4, x5, x8
2dcc: 6b1700df cmp w6, w23
2dd0: f9400087 ldr x7, [x4]
2dd4: f94000a0 ldr x0, [x5]
2dd8: 54fffee1 b.ne 2db4 <__d_lookup_rcu+0xb4> // b.any
2ddc: 531d72e1 lsl w1, w23, #3
2de0: ca070000 eor x0, x0, x7
2de4: 9ac12361 lsl x1, x27, x1
2de8: ea21001f bics xzr, x0, x1
2dec: 54fffba1 b.ne 2d60 <__d_lookup_rcu+0x60> // b.any
2df0: b9000314 str w20, [x24]
2df4: aa0903e0 mov x0, x9
2df8: a94153f3 ldp x19, x20, [sp, #16]
2dfc: a9425bf5 ldp x21, x22, [sp, #32]
2e00: a94363f7 ldp x23, x24, [sp, #48]
2e04: a9446bf9 ldp x25, x26, [sp, #64]
2e08: a94573fb ldp x27, x28, [sp, #80]
2e0c: a8c77bfd ldp x29, x30, [sp], #112
2e10: d65f03c0 ret
2e14: b9402001 ldr w1, [x0, #32]
2e18: 6b01039f cmp w28, w1
2e1c: 54fffa21 b.ne 2d60 <__d_lookup_rcu+0x60> // b.any
2e20: b9402401 ldr w1, [x0, #36]
2e24: f9401402 ldr x2, [x0, #40]
2e28: d50339bf dmb ishld
2e2c: b85fc264 ldur w4, [x19, #-4]
2e30: 6b04029f cmp w20, w4
2e34: 54000221 b.ne 2e78 <__d_lookup_rcu+0x178> // b.any
2e38: f94032a4 ldr x4, [x21, #96]
2e3c: a90627e3 stp x3, x9, [sp, #96]
2e40: f9400c84 ldr x4, [x4, #24]
2e44: d63f0080 blr x4
2e48: a94627e3 ldp x3, x9, [sp, #96]
2e4c: 34fffd20 cbz w0, 2df0 <__d_lookup_rcu+0xf0>
2e50: 17ffffc4 b 2d60 <__d_lookup_rcu+0x60>
2e54: a94363f7 ldp x23, x24, [sp, #48]
2e58: a94573fb ldp x27, x28, [sp, #80]
2e5c: d2800009 mov x9, #0x0 // #0
2e60: aa0903e0 mov x0, x9
2e64: a94153f3 ldp x19, x20, [sp, #16]
2e68: a9425bf5 ldp x21, x22, [sp, #32]
2e6c: a9446bf9 ldp x25, x26, [sp, #64]
2e70: a8c77bfd ldp x29, x30, [sp], #112
2e74: d65f03c0 ret
2e78: d503203f yield
2e7c: b85fc265 ldur w5, [x19, #-4]
2e80: d50339bf dmb ishld
2e84: f9400c01 ldr x1, [x0, #24]
2e88: 121f78b4 and w20, w5, #0xfffffffe
2e8c: eb15003f cmp x1, x21
2e90: 54fff681 b.ne 2d60 <__d_lookup_rcu+0x60> // b.any
2e94: 17ffffbd b 2d88 <__d_lookup_rcu+0x88>

0000000000002e98 <__d_lookup>:
2e98: a9b97bfd stp x29, x30, [sp, #-112]!
2e9c: 90000002 adrp x2, 0 <find_submount>
2e9c: R_AARCH64_ADR_PREL_PG_HI21 .data..read_mostly
2ea0: 91000043 add x3, x2, #0x0
2ea0: R_AARCH64_ADD_ABS_LO12_NC .data..read_mostly
2ea4: 910003fd mov x29, sp
2ea8: a90573fb stp x27, x28, [sp, #80]
2eac: aa0103fc mov x28, x1
2eb0: a90153f3 stp x19, x20, [sp, #16]
2eb4: a90363f7 stp x23, x24, [sp, #48]
2eb8: a9046bf9 stp x25, x26, [sp, #64]
2ebc: aa0003fa mov x26, x0
2ec0: b9400397 ldr w23, [x28]
2ec4: b9400860 ldr w0, [x3, #8]
2ec8: f9400041 ldr x1, [x2]
2ec8: R_AARCH64_LDST64_ABS_LO12_NC .data..read_mostly
2ecc: 1ac026e0 lsr w0, w23, w0
2ed0: f8607833 ldr x19, [x1, x0, lsl #3]
2ed4: f27ffa73 ands x19, x19, #0xfffffffffffffffe
2ed8: 54000320 b.eq 2f3c <__d_lookup+0xa4> // b.none
2edc: 5280001b mov w27, #0x0 // #0
2ee0: 92800018 mov x24, #0xffffffffffffffff // #-1
2ee4: a9025bf5 stp x21, x22, [sp, #32]
2ee8: d2800016 mov x22, #0x0 // #0
2eec: 52800035 mov w21, #0x1 // #1
2ef0: b9401a62 ldr w2, [x19, #24]
2ef4: d1002274 sub x20, x19, #0x8
2ef8: 6b17005f cmp w2, w23
2efc: 540001a1 b.ne 2f30 <__d_lookup+0x98> // b.any
2f00: 91014279 add x25, x19, #0x50
2f04: f9800331 prfm pstl1strm, [x25]
2f08: 885fff21 ldaxr w1, [x25]
2f0c: 4a160020 eor w0, w1, w22
2f10: 35000060 cbnz w0, 2f1c <__d_lookup+0x84>
2f14: 88007f35 stxr w0, w21, [x25]
2f18: 35ffff80 cbnz w0, 2f08 <__d_lookup+0x70>
2f1c: 35000521 cbnz w1, 2fc0 <__d_lookup+0x128>
2f20: f9400e82 ldr x2, [x20, #24]
2f24: eb1a005f cmp x2, x26
2f28: 540001a0 b.eq 2f5c <__d_lookup+0xc4> // b.none
2f2c: 089fff3b stlrb w27, [x25]
2f30: f9400273 ldr x19, [x19]
2f34: b5fffdf3 cbnz x19, 2ef0 <__d_lookup+0x58>
2f38: a9425bf5 ldp x21, x22, [sp, #32]
2f3c: d2800008 mov x8, #0x0 // #0
2f40: aa0803e0 mov x0, x8
2f44: a94153f3 ldp x19, x20, [sp, #16]
2f48: a94363f7 ldp x23, x24, [sp, #48]
2f4c: a9446bf9 ldp x25, x26, [sp, #64]
2f50: a94573fb ldp x27, x28, [sp, #80]
2f54: a8c77bfd ldp x29, x30, [sp], #112
2f58: d65f03c0 ret
2f5c: f9400660 ldr x0, [x19, #8]
2f60: b4fffe60 cbz x0, 2f2c <__d_lookup+0x94>
2f64: b9400340 ldr w0, [x26]
2f68: aa1403e8 mov x8, x20
2f6c: b9402681 ldr w1, [x20, #36]
2f70: 370802e0 tbnz w0, #1, 2fcc <__d_lookup+0x134>
2f74: b9400784 ldr w4, [x28, #4]
2f78: 6b04003f cmp w1, w4
2f7c: 54fffd81 b.ne 2f2c <__d_lookup+0x94> // b.any
2f80: f9400787 ldr x7, [x28, #8]
2f84: 12000881 and w1, w4, #0x7
2f88: f9401265 ldr x5, [x19, #32]
2f8c: cb0500e7 sub x7, x7, x5
2f90: 14000003 b 2f9c <__d_lookup+0x104>
2f94: 71002084 subs w4, w4, #0x8
2f98: 54000300 b.eq 2ff8 <__d_lookup+0x160> // b.none
2f9c: 8b0700a2 add x2, x5, x7
2fa0: 6b04003f cmp w1, w4
2fa4: f9400046 ldr x6, [x2]
2fa8: f94000a0 ldr x0, [x5]
2fac: 54000340 b.eq 3014 <__d_lookup+0x17c> // b.none
2fb0: 910020a5 add x5, x5, #0x8
2fb4: eb06001f cmp x0, x6
2fb8: 54fffee0 b.eq 2f94 <__d_lookup+0xfc> // b.none
2fbc: 17ffffdc b 2f2c <__d_lookup+0x94>
2fc0: aa1903e0 mov x0, x25
2fc4: 94000000 bl 0 <queued_spin_lock_slowpath>
2fc4: R_AARCH64_CALL26 queued_spin_lock_slowpath
2fc8: 17ffffd6 b 2f20 <__d_lookup+0x88>
2fcc: f9403340 ldr x0, [x26, #96]
2fd0: aa1c03e3 mov x3, x28
2fd4: f9401682 ldr x2, [x20, #40]
2fd8: f90037f4 str x20, [sp, #104]
2fdc: f9400c04 ldr x4, [x0, #24]
2fe0: aa1403e0 mov x0, x20
2fe4: d63f0080 blr x4
2fe8: 7100001f cmp w0, #0x0
2fec: 1a9f17e0 cset w0, eq // eq = none
2ff0: f94037e8 ldr x8, [sp, #104]
2ff4: 34fff9c0 cbz w0, 2f2c <__d_lookup+0x94>
2ff8: b9405e80 ldr w0, [x20, #92]
2ffc: 52800001 mov w1, #0x0 // #0
3000: 11000400 add w0, w0, #0x1
3004: b9005e80 str w0, [x20, #92]
3008: 089fff21 stlrb w1, [x25]
300c: a9425bf5 ldp x21, x22, [sp, #32]
3010: 17ffffcc b 2f40 <__d_lookup+0xa8>
3014: 531d7021 lsl w1, w1, #3
3018: ca060000 eor x0, x0, x6
301c: 9ac12301 lsl x1, x24, x1
3020: ea21001f bics xzr, x0, x1
3024: 1a9f17e0 cset w0, eq // eq = none
3028: 34fff820 cbz w0, 2f2c <__d_lookup+0x94>
302c: 17fffff3 b 2ff8 <__d_lookup+0x160>

0000000000003030 <d_lookup>:
3030: a9bd7bfd stp x29, x30, [sp, #-48]!
3034: 910003fd mov x29, sp
3038: a90153f3 stp x19, x20, [sp, #16]
303c: 90000013 adrp x19, 0 <find_submount>
303c: R_AARCH64_ADR_PREL_PG_HI21 .data..cacheline_aligned
3040: aa0103f4 mov x20, x1
3044: 91000273 add x19, x19, #0x0
3044: R_AARCH64_ADD_ABS_LO12_NC .data..cacheline_aligned
3048: a9025bf5 stp x21, x22, [sp, #32]
304c: aa0003f5 mov x21, x0
3050: b9400276 ldr w22, [x19]
3054: 370001d6 tbnz w22, #0, 308c <d_lookup+0x5c>
3058: d50339bf dmb ishld
305c: aa1403e1 mov x1, x20
3060: aa1503e0 mov x0, x21
3064: 94000000 bl 2e98 <__d_lookup>
3064: R_AARCH64_CALL26 __d_lookup
3068: b50000a0 cbnz x0, 307c <d_lookup+0x4c>
306c: d50339bf dmb ishld
3070: b9400261 ldr w1, [x19]
3074: 6b16003f cmp w1, w22
3078: 54fffec1 b.ne 3050 <d_lookup+0x20> // b.any
307c: a94153f3 ldp x19, x20, [sp, #16]
3080: a9425bf5 ldp x21, x22, [sp, #32]
3084: a8c37bfd ldp x29, x30, [sp], #48
3088: d65f03c0 ret
308c: d503203f yield
3090: 17fffff0 b 3050 <d_lookup+0x20>
3094: d503201f nop
...

This trace is from v5.2.0-rc1:
Unable to handle kernel paging request at virtual address 0000880001000018
[apparently identical oops, modulo the call chain to d_lookup(); since that's
almost certainly buggered data structures encountered during the hash lookup,
exact callchain doesn't matter all that much; procfs is the filesystem involved]

This trace is from v5.2.0-rc1 while executing 'git pull -r' from f2fs. It
got repeated several times:

Unable to handle kernel paging request at virtual address 0000000000fffffc
user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000092bdb9cd
[0000000000fffffc] pgd=0000000000000000
pc : __d_lookup_rcu+0x68/0x198

This trace is from v5.2.0-rc1 while executing 'rm -rf' the directory
affected from the previous trace:

Unable to handle kernel paging request at virtual address 0000000001000018

... and addresses involved are

0000880001000018
0000000000fffffc
0000000001000018

AFAICS, the only registers with the value in the vicinity of those addresses
had been (in all cases so far) x19 - 0000880001000000 in the first two traces,
0000000001000000 in the last two...

I'd really like to see the disassembly of the functions involved (as well as
.config in question).

Here is the .config: https://paste.debian.net/1082689

Regards,
VicenÃ.