Re: Serial related oops

From: Jose Goncalves
Date: Thu Mar 01 2007 - 08:35:34 EST


Hi again Russel,

I'm back, after some more testing. Here goes my report.

I've switched to another SBC and the kernel still Oops, so is not a
one-off fault on the hardware.

I've also run memtest86+ on this board for the maximum period that I
reach an Oops with my application (24 H) and it not detected any fault
(in 21 passes).

As I've said earlier, our hardware as an extra serial controller
(TL16C554A). To isolate the problem, I've removed the board with this
extra controller and used only the SBC (Vortex86-6070 -
http://www.icop.com.tw/products_detail.asp?ProductID=70). Still, with
that setup and with my application using only ttyS1, I get kernel Oops,
and always in the same point:

<1>[43477.986867] Unable to handle kernel NULL pointer dereference at
virtual address 00000012
<1>[43477.995067] printing eip:
<4>[43478.003087] c01bfa7a
<1>[43478.003116] *pde = 00000000
<0>[43478.011231] Oops: 0000 [#1]
<4>[43478.019188] Modules linked in:
<0>[43478.027308] CPU: 0
<4>[43478.027325] EIP: 0060:[<c01bfa7a>] Not tainted VLI
<4>[43478.027341] EFLAGS: 00010202 (2.6.16.41-mtm6-debug1 #1)
<0>[43478.052490] EIP is at serial_in+0xa/0x4a
<0>[43478.061448] eax: 00000060 ebx: 00000000 ecx: 00000000 edx:
00000000
<0>[43478.070945] esi: 00000000 edi: 00000040 ebp: c7237e1c esp:
c7237e18
<0>[43478.080720] ds: 007b es: 007b ss: 0068
<0>[43478.090470] Process gp_position (pid: 26205, threadinfo=c7236000
task=c775dab0)
<0>[43478.091319] Stack: <0>00000000 00000000 c01c0f88 00000000 00000000
c031fef0 00000005 00000202
<0>[43478.113464] c717fa1c c031fef0 c124b510 c7237e60 c01bd97d
c031fef0 c124b510 c124b510
<0>[43478.126484] 00000000 c760c52c c7237e7c c01befe7 c124b510
00000000 ffffffed c760c52c
<0>[43478.139984] Call Trace:
<0>[43478.152627] [<c0102a35>] show_stack_log_lvl+0xa5/0xad
<0>[43478.166200] [<c0102b70>] show_registers+0x106/0x16f
<0>[43478.179852] [<c0102d06>] die+0xb6/0x127
<0>[43478.193589] [<c0109677>] do_page_fault+0x380/0x4b3
<0>[43478.207616] [<c01026bf>] error_code+0x4f/0x60
<0>[43478.221803] [<c01c0f88>] serial8250_startup+0x28f/0x2a9
<0>[43478.236340] Code: 38 43 78 75 02 b2 01 89 d0 eb 10 8b 41 70 39 43
70 0f 94 c0 0f b6 c0 eb 02 31 c0 5b 5d c3 90 90 90 55 89 e5 53 8b 5d 08
8b 55 0c <0f> b6 4b 12 0f b6 43 13 d3 e2 83 f8 02 74 1a 7f 05 48 74 09 eb
<4>[43478.322255] BUG: gp_position/26205, lock held at task exit time!
<4>[43478.341721] [c124b528] {uart_register_driver}
<4>[43478.359168] .. held by: gp_position:26205 [c775dab0, 117]
<4>[43478.377112] ... acquired at: uart_get+0x28/0xde

I've also done your suggestion and I've inserted "msleep(10);" just
before the "And clear the interrupt registers again for luck." and my
application is now running without problems fore more than 24H! So,
inserting a delay in this point definitely makes some difference (has
was with adding some extra printk() in several points of
serial8250_startup()).

This said, for me, this is definitely a software problem. The question
is were?
I would appreciate if you (or anyone) could give me any pointers on how
to detect the cause of my kernel Oops (perhaps activating extra kernel
debug?)

Thanks,
José Gonçalves


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/