Re: RCU bug with v3.17-rc3 ?

From: Felipe Balbi
Date: Thu Oct 09 2014 - 12:27:50 EST


Hi Johannes,

On Thu, Oct 09, 2014 at 12:01:38PM -0400, Johannes Weiner wrote:
> On Wed, Oct 08, 2014 at 04:29:38PM -0500, Felipe Balbi wrote:
> > Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > (lib: radix_tree: tree node interface). Here's full bisect log:
> >
> > git bisect start
> > # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> > git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
> > # bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
> > git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
> > # bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style recommendation to use imperative descriptions
> > git bisect bad 74a475acea49459721ae4b062d3da68c74259009
> > # good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> > git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
> > # good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
> > git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
> > # good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
> > git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
> > # good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
> > git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
> > # good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if clause from PTP work
> > git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
> > # good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> > git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
> > # good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove redundant comparison
> > git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
> > # bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some bootstrap functions as __init
> > git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
> > # good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use vma_resv_map() map types
> > git bisect good 4e35f483850ba46b838adfd312b3052416e15204
> > # good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix tree lookup when truncating swapped pages
> > git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
> > # good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow entries in page cache
> > git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
> > # bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> > git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > # good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based file cache sizing
> > git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
> > # first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> >
> > I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
> > that for tomorrow. Meanwhile, adding folks involved with that commit to
> > Cc list and another backtrace for reference:
> >
> > [ 113.696647] Unable to handle kernel paging request at virtual address ffffffff
> > [ 113.704370] pgd = c0004000
> > [ 113.707276] [ffffffff] *pgd=9fef6821, *pte=00000000, *ppte=00000000
> > [ 113.713998] Internal error: Oops: 17 [#1] SMP ARM
> > [ 113.718912] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs musb_dsps musb_hdrc musb_am335x
> > [ 113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 3.17.0-02899-g748eb79 #239
> > [ 113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
> > [ 113.744060] PC is at find_get_entry+0x64/0x100
>
> Could you please provide the disassembly of that function?

here you go. It's ARM assembly however:

Dump of assembler code for function find_get_entry:
0xc011da48 <+0>: mov r12, sp
0xc011da4c <+4>: push {r4, r5, r6, r7, r8, r9, r11, r12, lr, pc}
0xc011da50 <+8>: sub r11, r12, #4
0xc011da54 <+12>: sub sp, sp, #16
0xc011da58 <+16>: push {lr} ; (str lr, [sp, #-4]!)
0xc011da5c <+20>: bl 0xc000ef00 <__gnu_mcount_nc>
0xc011da60 <+24>: mov r6, r0
0xc011da64 <+28>: mov r7, r1
0xc011da68 <+32>: ldr r2, [pc, #520] ; 0xc011dc78 <find_get_entry+560>
0xc011da6c <+36>: mov r3, #0
0xc011da70 <+40>: mov r1, r3
0xc011da74 <+44>: str r2, [sp, #8]
0xc011da78 <+48>: str r3, [sp]
0xc011da7c <+52>: mov r2, r3
0xc011da80 <+56>: str r3, [sp, #4]
0xc011da84 <+60>: ldr r0, [pc, #496] ; 0xc011dc7c <find_get_entry+564>
0xc011da88 <+64>: mov r3, #2
0xc011da8c <+68>: bl 0xc0095f88 <lock_acquire>
0xc011da90 <+72>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011da94 <+76>: cmp r0, #0
0xc011da98 <+80>: beq 0xc011daac <find_get_entry+100>
0xc011da9c <+84>: ldr r4, [pc, #476] ; 0xc011dc80 <find_get_entry+568>
0xc011daa0 <+88>: ldrb r3, [r4, #1]
0xc011daa4 <+92>: cmp r3, #0
0xc011daa8 <+96>: beq 0xc011dbfc <find_get_entry+436>
0xc011daac <+100>: ldr r8, [pc, #460] ; 0xc011dc80 <find_get_entry+568>
0xc011dab0 <+104>: add r6, r6, #4
0xc011dab4 <+108>: mov r5, #1
0xc011dab8 <+112>: mov r0, r6
0xc011dabc <+116>: mov r1, r7
0xc011dac0 <+120>: bl 0xc0364660 <radix_tree_lookup_slot>
0xc011dac4 <+124>: subs r9, r0, #0
0xc011dac8 <+128>: beq 0xc011dc24 <find_get_entry+476>
0xc011dacc <+132>: ldr r4, [r9]
0xc011dad0 <+136>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011dad4 <+140>: cmp r0, #0
0xc011dad8 <+144>: beq 0xc011dae8 <find_get_entry+160>
0xc011dadc <+148>: ldrb r3, [r8, #2]
0xc011dae0 <+152>: cmp r3, #0
0xc011dae4 <+156>: beq 0xc011dbcc <find_get_entry+388>
0xc011dae8 <+160>: cmp r4, #0
0xc011daec <+164>: beq 0xc011dc24 <find_get_entry+476>
0xc011daf0 <+168>: tst r4, #3
0xc011daf4 <+172>: bne 0xc011dc4c <find_get_entry+516>
0xc011daf8 <+176>: mov r2, sp
0xc011dafc <+180>: bic r3, r2, #8128 ; 0x1fc0
0xc011db00 <+184>: bic r3, r3, #63 ; 0x3f
0xc011db04 <+188>: ldr r2, [pc, #376] ; 0xc011dc84 <find_get_entry+572>
0xc011db08 <+192>: ldr r3, [r3, #4]
0xc011db0c <+196>: and r2, r2, r3
0xc011db10 <+200>: cmp r2, #0
0xc011db14 <+204>: bne 0xc011dc68 <find_get_entry+544>
0xc011db18 <+208>: add r3, r4, #16
0xc011db1c <+212>: mcr 15, 0, r2, cr7, cr10, {5}
0xc011db20 <+216>: mov r2, #0
0xc011db24 <+220>: pld [r3]
0xc011db28 <+224>: ldrex r1, [r3]
0xc011db2c <+228>: teq r1, r2
0xc011db30 <+232>: beq 0xc011db44 <find_get_entry+252>
0xc011db34 <+236>: add r0, r1, r5
0xc011db38 <+240>: strex r12, r0, [r3]
0xc011db3c <+244>: teq r12, #0
0xc011db40 <+248>: bne 0xc011db28 <find_get_entry+224>
0xc011db44 <+252>: cmp r1, #0
0xc011db48 <+256>: beq 0xc011dab8 <find_get_entry+112>
0xc011db4c <+260>: mov r3, #0
0xc011db50 <+264>: mcr 15, 0, r3, cr7, cr10, {5}
0xc011db54 <+268>: ldr r3, [r4]
0xc011db58 <+272>: tst r3, #32768 ; 0x8000
0xc011db5c <+276>: bne 0xc011dc58 <find_get_entry+528>
0xc011db60 <+280>: ldr r3, [r9]
0xc011db64 <+284>: cmp r3, r4
0xc011db68 <+288>: bne 0xc011dc6c <find_get_entry+548>
0xc011db6c <+292>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011db70 <+296>: cmp r0, #0
0xc011db74 <+300>: beq 0xc011db88 <find_get_entry+320>
0xc011db78 <+304>: ldr r5, [pc, #256] ; 0xc011dc80 <find_get_entry+568>
0xc011db7c <+308>: ldrb r3, [r5, #3]
0xc011db80 <+312>: cmp r3, #0
0xc011db84 <+316>: beq 0xc011dba4 <find_get_entry+348>
0xc011db88 <+320>: ldr r0, [pc, #236] ; 0xc011dc7c <find_get_entry+564>
0xc011db8c <+324>: mov r1, #1
0xc011db90 <+328>: ldr r2, [pc, #240] ; 0xc011dc88 <find_get_entry+576>
0xc011db94 <+332>: bl 0xc0096380 <lock_release>
0xc011db98 <+336>: sub sp, r11, #36 ; 0x24
0xc011db9c <+340>: mov r0, r4
0xc011dba0 <+344>: ldm sp, {r4, r5, r6, r7, r8, r9, r11, sp, pc}
0xc011dba4 <+348>: bl 0xc00aadc4 <rcu_is_watching>
0xc011dba8 <+352>: cmp r0, #0
0xc011dbac <+356>: bne 0xc011db88 <find_get_entry+320>
0xc011dbb0 <+360>: mov r3, #1
0xc011dbb4 <+364>: ldr r0, [pc, #208] ; 0xc011dc8c <find_get_entry+580>
0xc011dbb8 <+368>: ldr r1, [pc, #208] ; 0xc011dc90 <find_get_entry+584>
0xc011dbbc <+372>: ldr r2, [pc, #208] ; 0xc011dc94 <find_get_entry+588>
0xc011dbc0 <+376>: strb r3, [r5, #3]
0xc011dbc4 <+380>: bl 0xc00920cc <lockdep_rcu_suspicious>
0xc011dbc8 <+384>: b 0xc011db88 <find_get_entry+320>
0xc011dbcc <+388>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011dbd0 <+392>: cmp r0, #0
0xc011dbd4 <+396>: beq 0xc011dae8 <find_get_entry+160>
0xc011dbd8 <+400>: bl 0xc00aadc4 <rcu_is_watching>
0xc011dbdc <+404>: cmp r0, #0
0xc011dbe0 <+408>: bne 0xc011dc2c <find_get_entry+484>
0xc011dbe4 <+412>: ldr r0, [pc, #172] ; 0xc011dc98 <find_get_entry+592>
0xc011dbe8 <+416>: mov r1, #196 ; 0xc4
0xc011dbec <+420>: ldr r2, [pc, #168] ; 0xc011dc9c <find_get_entry+596>
0xc011dbf0 <+424>: strb r5, [r8, #2]
0xc011dbf4 <+428>: bl 0xc00920cc <lockdep_rcu_suspicious>
0xc011dbf8 <+432>: b 0xc011dae8 <find_get_entry+160>
0xc011dbfc <+436>: bl 0xc00aadc4 <rcu_is_watching>
0xc011dc00 <+440>: cmp r0, #0
0xc011dc04 <+444>: bne 0xc011daac <find_get_entry+100>
0xc011dc08 <+448>: mov r3, #1
0xc011dc0c <+452>: ldr r0, [pc, #120] ; 0xc011dc8c <find_get_entry+580>
0xc011dc10 <+456>: mov r1, #844 ; 0x34c
0xc011dc14 <+460>: ldr r2, [pc, #132] ; 0xc011dca0 <find_get_entry+600>
0xc011dc18 <+464>: strb r3, [r4, #1]
0xc011dc1c <+468>: bl 0xc00920cc <lockdep_rcu_suspicious>
0xc011dc20 <+472>: b 0xc011daac <find_get_entry+100>
0xc011dc24 <+476>: mov r4, #0
0xc011dc28 <+480>: b 0xc011db6c <find_get_entry+292>
0xc011dc2c <+484>: bl 0xc00ac38c <rcu_lockdep_current_cpu_online>
0xc011dc30 <+488>: cmp r0, #0
0xc011dc34 <+492>: beq 0xc011dbe4 <find_get_entry+412>
0xc011dc38 <+496>: ldr r0, [pc, #60] ; 0xc011dc7c <find_get_entry+564>
0xc011dc3c <+500>: bl 0xc0091264 <lock_is_held>
0xc011dc40 <+504>: cmp r0, #0
0xc011dc44 <+508>: beq 0xc011dbe4 <find_get_entry+412>
0xc011dc48 <+512>: b 0xc011dae8 <find_get_entry+160>
0xc011dc4c <+516>: tst r4, #1
0xc011dc50 <+520>: beq 0xc011db6c <find_get_entry+292>
0xc011dc54 <+524>: b 0xc011dab8 <find_get_entry+112>
0xc011dc58 <+528>: mov r0, r4
0xc011dc5c <+532>: ldr r1, [pc, #64] ; 0xc011dca4 <find_get_entry+604>
0xc011dc60 <+536>: bl 0xc01254d4 <dump_page>
0xc011dc64 <+540>: ; <UNDEFINED> instruction: 0xe7f001f2
0xc011dc68 <+544>: ; <UNDEFINED> instruction: 0xe7f001f2
0xc011dc6c <+548>: mov r0, r4
0xc011dc70 <+552>: bl 0xc012db6c <put_page>
0xc011dc74 <+556>: b 0xc011dab8 <find_get_entry+112>
0xc011dc78 <+560>: andsgt sp, r1, r8, asr #20
0xc011dc7c <+564>: adcgt r2, r11, r8, lsl r2
0xc011dc80 <+568>: ldrhtgt r0, [r0], r1
0xc011dc84 <+572>: andseq pc, pc, r0, lsl #30
0xc011dc88 <+576>: andsgt sp, r1, r8, lsl #23
0xc011dc8c <+580>: addgt sp, r5, r8, lsl #5
0xc011dc90 <+584>: andeq r0, r0, sp, ror r3
0xc011dc94 <+588>: ldrdgt sp, [r5], r0
0xc011dc98 <+592>: addgt sp, r7, r8, asr #7
0xc011dc9c <+596>: addgt lr, r6, r8, lsl #17
0xc011dca0 <+600>: addgt sp, r5, r4, lsr #5
0xc011dca4 <+604>: addgt sp, r7, r4, ror #7
End of assembler dump.

> I'm thinking it's not the slot pointer itself that's bad, because
> __radix_tree_lookup() dereferences that to test if it's populated
> before returning it, and slot life-time is guaranteed by RCU.
>
> That would only leave garbage in the slot itself, crashing during
> page_cache_get_speculative().
>
> I'll keep staring at this change, but nothing stands out to me yet.

alright, it's pretty deterministic however. Always on the same test, no
matter which USB controller, no matter if backing store is RAM or MMC.

Those two undefined instructions on the disassembly caught my attention,
perhaps I'm facing a GCC bug ?

--
balbi

Attachment: signature.asc
Description: Digital signature