Re: ftrace introduces instability into kernel 2.6.27(-rc2,-rc3)

From: Eran Liberty
Date: Wed Aug 20 2008 - 10:04:53 EST


Steven Rostedt wrote:
On Wed, 20 Aug 2008, Eran Liberty wrote:

Steven Rostedt wrote:
On Wed, 20 Aug 2008, Steven Rostedt wrote:

On Wed, 20 Aug 2008, Benjamin Herrenschmidt wrote:

Found the problem (or at least -a- problem), it's a gcc bug.

Well, first I must say the code generated by -pg is just plain
horrible :-)

Appart from that, look at the exit of, for example, __d_lookup, as
generated by gcc when ftrace is enabled:

c00c0498: 38 60 00 00 li r3,0
c00c049c: 81 61 00 00 lwz r11,0(r1)
c00c04a0: 80 0b 00 04 lwz r0,4(r11)
c00c04a4: 7d 61 5b 78 mr r1,r11
c00c04a8: bb 0b ff e0 lmw r24,-32(r11)
c00c04ac: 7c 08 03 a6 mtlr r0
c00c04b0: 4e 80 00 20 blr

As you can see, it restores r1 -before- it pops r24..r31 off
the stack ! I let you imagine what happens if an interrupt happens
just in between those two instructions (mr and lmw). We don't do
redzones on our ABI, so basically, the registers end up corrupted
by the interrupt.
Ouch! You've disassembled this without -pg too, and it does not have this
bug? What version of gcc do you have?

I have:
gcc (Debian 4.3.1-2) 4.3.1

c00c64c8: 81 61 00 00 lwz r11,0(r1)
c00c64cc: 7f 83 e3 78 mr r3,r28
c00c64d0: 80 0b 00 04 lwz r0,4(r11)
c00c64d4: ba eb ff dc lmw r23,-36(r11)
c00c64d8: 7d 61 5b 78 mr r1,r11
c00c64dc: 7c 08 03 a6 mtlr r0
c00c64e0: 4e 80 00 20 blr


My version looks fine. I'm thinking that this is a separate issue than what
Eran is seeing.

Eran, can you do an "objdump -dr vmlinux" and search for __d_lookup, and
print out the end of the function dump.

Thanks,

-- Steve



powerpc-linux-gnu-objdump -dr --start-address=0xc00bb584 vmlinux | head -n 100

vmlinux: file format elf32-powerpc

Disassembly of section .text:

c00bb584 <__d_lookup>:

[...]

c00bb670: 41 9e 00 50 beq- cr7,c00bb6c0 <__d_lookup+0x13c>
c00bb674: 83 de 00 00 lwz r30,0(r30)
c00bb678: 2f 9e 00 00 cmpwi cr7,r30,0
c00bb67c: 40 9e ff 98 bne+ cr7,c00bb614 <__d_lookup+0x90>
c00bb680: 38 60 00 00 li r3,0
c00bb684: 81 61 00 00 lwz r11,0(r1)
c00bb688: 80 0b 00 04 lwz r0,4(r11)
c00bb68c: 7d 61 5b 78 mr r1,r11

[ BUG HERE IF INTERRUPT HAPPENS ]

c00bb690: bb 0b ff e0 lmw r24,-32(r11)
c00bb694: 7c 08 03 a6 mtlr r0
c00bb698: 4e 80 00 20 blr

Yep, you have the same bug in your compiler.

-- Steve
Hmm... so whats now?

Is there a way to prove this scenario is indeed the one that caused the opps?

-- Liberty
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/