Re: Serial related oops

From: Russell King
Date: Mon Feb 19 2007 - 16:24:19 EST


On Mon, Feb 19, 2007 at 05:54:52PM +0000, Jose Goncalves wrote:
> Russell King wrote:
> Result is attached.

Right... in depth analysis follows.

[15423.650518] [] uart_startup+0x63/0xf4 equates to 0xc01ba49a, which
is indeed the instruction after the call to port->ops->startup.

The important code leading up to this is:

c01ba437 <uart_startup>:
c01ba437: 55 push %ebp
c01ba438: 57 push %edi
c01ba439: 56 push %esi
c01ba43a: 53 push %ebx
c01ba43b: 8b 7c 24 14 mov 0x14(%esp),%edi @ load state
c01ba43f: 31 d2 xor %edx,%edx
c01ba441: 8b 5f 10 mov 0x10(%edi),%ebx @ load state->info
c01ba444: 8b 77 14 mov 0x14(%edi),%esi @ load state->port

c01ba493: 8b 46 64 mov 0x64(%esi),%eax @ load port->ops
c01ba496: 56 push %esi @ push "port" onto stack
c01ba497: ff 50 24 call *0x24(%eax) @ ops->startup(port)

c01bd74b <serial8250_startup>:
c01bd74b: 55 push %ebp
c01bd74c: 57 push %edi
c01bd74d: 56 push %esi
c01bd74e: 53 push %ebx
c01bd74f: 8b 5c 24 14 mov 0x14(%esp),%ebx

Comparing this with the stack dump:

Stack:
c02fae70
00000005
c74304e0 <- %ebx (pushed by serial8250_startup)
c02fae70 <- %esi (pushed by serial8250_startup)
c128d5e4 <- %edi (pushed by serial8250_startup)
c7a69a80 <- %ebp (pushed by serial8250_startup)
c01ba49a <- uart_startup+0x63/0xf4 (pushed by function called by
ops->startup, iow serial8250_startup)
c02fae70 <- pushed on by "push %esi" at c01ba496, this is "port"
c128d5e4 <- %ebx (pushed by uart_startup)
00000000 <- %esi (pushed by uart_startup)
c7a69a80 <- %edi (pushed by uart_startup)
c7a69a80 <- %ebp (pushed by uart_startup)
c01bbaa0 <- probably uart_open+0xaa/0xec

Once the instruction at c01bd74f completes, we have pushed into the stack
the structure commented above, but not the first two uncommented values.
%ebx contains the value of "port" at this point. We're looking for some
place in the code which pushes a value of '5' and '%ebx' on to the stack,
and the CPUs registers contain values which correspond with the values
provided in your oops.

The code corresponding with the buggy uart check is as follows. Comments
interspersed:

c01bd910: 9c pushf
c01bd911: 5d pop %ebp
c01bd912: fa cli

This code pushes the processors flag register onto the stack, pops it off
into the %ebp register, and then disables interrupts.

Your oops dump contained "ebp: 00000202" which is a reasonable value for
x86 processors flags, which have been saved into the ebp register by the
above code sequence (according to Wikipedia).

c01bd937: ff 73 58 pushl 0x58(%ebx)
c01bd93a: 53 push %ebx
c01bd93b: e8 7f fd ff ff call c01bd6bf <serial8250_set_mctrl>
c01bd940: 6a 02 push $0x2
c01bd942: 6a 01 push $0x1
c01bd944: 53 push %ebx
c01bd945: e8 f6 eb ff ff call c01bc540 <serial_out> @ write IER
c01bd94a: 6a 05 push $0x5
c01bd94c: 53 push %ebx
c01bd94d: e8 a6 eb ff ff call c01bc4f8 <serial_in> @ reads LSR
c01bd952: 89 c7 mov %eax,%edi @ saves result in %edi
c01bd954: 6a 02 push $0x2
c01bd956: 53 push %ebx
c01bd957: e8 9c eb ff ff call c01bc4f8 <serial_in> @ reads IIR
c01bd95c: 83 c4 24 add $0x24,%esp
c01bd95f: 89 c6 mov %eax,%esi @ saves result in %esi

This is the code corresponding with part of the buggy uart check - you can
see the call to serial8250_set_mctrl() there which confirms this. The
sequence at c01bd94a pushes "5" and "port" (%ebx) onto the stack, but
this isn't the right place because before this we pushed "2", "1", and
"port" on the stack, and those are not present in the stack dump.

However, the reason for showing this is that a little while later, we
have:

c01bd96e: 83 e7 40 and $0x40,%edi
c01bd971: 74 1c je c01bd98f <serial8250_startup+0x244>
c01bd973: 83 e6 01 and $0x1,%esi
c01bd976: 74 17 je c01bd98f <serial8250_startup+0x244>

The normal value we would read from the LSR (stored in %edi) would be
0x60, and if a transmit interrupt was pending (which is what the test
is trying to find out) the IIR (%esi) would be 0x02.

The above code sequence which involves masking these values would
therefore give:
0x40 & 0x60 (%edi) -> 0x40 in %edi
0x01 & 0x02 (%esi) -> 0x00 in %esi

>From your oops dump "edi: 00000040" and "esi: 00000000" - that ties up,
so we know that the place we got to must be after this point.

We eventually come to this sequence. The words previously pushed onto
the stack have been removed at this point, and %ebp, %edi nor %esi have
been touched by any other code since they were last commented on above.

c01bd9c6: 6a 05 push $0x5
c01bd9c8: 53 push %ebx
c01bd9c9: e8 2a eb ff ff call c01bc4f8 <serial_in>

Here, we push "5" and "port" on the stack - that ties up with the stack
dump in the oops. It's a call to read the LSR again, and it returns a
value in the '%eax' register. From your oops report "eax: 00000060"
which as previously mentioned is the expected value of the LSR at this
point.

The C code this corresponds with the above assembly is the first
serial_inp (which is just an alias of serial_in) of:

/*
* And clear the interrupt registers again for luck.
*/
(void) serial_inp(up, UART_LSR);
(void) serial_inp(up, UART_RX);
(void) serial_inp(up, UART_IIR);
(void) serial_inp(up, UART_MSR);

return 0;

At this point, we have:

1. a stack layout that looks identical to the one in your oops.
2. %eax, %esi, %edi, %ebp all tie up between your oops and the values
we would expect at this point.
3. %ebx contains zero which we don't expect (it was containing our
value of "port"), %ecx we don't have any way to determine (it's not
touched by this function).

One final value is left unexplained - "edx: 000002fd". I suggest your
dump was from a session using ttyS3, and %edx contains the base address
+ register offset (5) which gives 0x2fd.

So I believe that this is the place in serial8250_startup() where badness
is happening, and it's after that check for buggy uarts.

Moreover, I suspect taht the exact point is towards the end of the
serial_in() function, after the read of the LSR completed, and things
started to go wrong when serial_in() tried to return to it's caller -
when it was popping values off the stack including the return address
instead of reading the correct return address of the stack, it read
zero.

It would be good if someone who understands x86 assembly better than
me had a look at this and confirmed my suspicions though.

Can I ask that you try something - can you try adding a "msleep(10);"
just before the "And clear the interrupt registers again for luck."
comment and see what effect that has please?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/