Internal kernel error on ZC702 Evaluation Kit with Xilinx kernel
From: Yichao Yu
Date: Wed May 02 2018 - 18:37:49 EST
Hi everyone,
I've recently got a few internal errors from the kernel on one of the
few boards that I'm maintaining. The error happens every few days and
only seem to happen on one board and not on another board that has the
same kernel but slightly different userspace/usage pattern/network
connection/external evironment. The location where this happen and the
error reported seems to be pretty consistent (see below). I am not
100% if this is a kernel bug, a xilinx bug, a hardware defect etc but
I'm asking here first hoping to get some advice on how I could debug
this in our setup.
Description of our setup:
The system is a ZC702 eval board.
The kenel is https://github.com/Xilinx/linux-xlnx/commit/9c2e29b2c81dbb1efb7ee4944b18e12226b97513
Config file is attached. Cross compiled from x64 host with gcc 7.2.
Binary files are also available if anyone want to have a look.
FPGA customization defined a memory mapped device which is accessed
from userspace directly with `/dev/kmem` and that doesn't seem to be
part of the problem from the backtrace.
Userspace side, it's running an archlinux-arm with mostly a zmq server
processing a few requests per second.
What I could see myself:
I mostly know only user space stuff including some arm
assembly/registers so from what I can tell from the register dump it
seems that someone is trying to call to `0xfffffffe` from `0xc04ea488`
(-4) which is `pcf8563_read_block_data+0x80/0x8c`? Assuming I
disassembled the same binary I'm running the disassemble of the
function is listed below.
(obtained with `arm-linux-gnueabihf-objdump -S --start-address=0x.... vmlinux`)
So the instruction this happened on doesn't seem like it'll jmp to
that random address. I'm guessing it could also be something in
`dev_err` (disassemble also included) although the range where `lr` is
still holding it's original value doesn't seem suspicious either...
Any comment on how to further debug this, if this is a known problem,
or to confirm whether this is a software or possibly hardware problem
is welcome.
Thanks.
Yichao Yu
serial port log:
cdns-i2c e0004000.i2c: timeout
waiting on completion
rtc-pcf8563 5-0051: pcf8563_read_block_data: read error
Unable to handle kernel paging request at virtual address fffffffe
pgd = c0004000
[fffffffe] *pgd=2fffd861, *pte=00000000, *ppte=00000000
Internal error: Oops - BUG: 80000007 [#1] PREEMPT SMP ARM
Modules linked in: knacs(O) ipv6
CPU: 0 PID: 3520 Comm: kworker/0:2 Tainted: G O 4.9.0-3-nacs #1
Hardware name: Xilinx Zynq Platform
Workqueue: events rtc_timer_do_work
task: ef381840 task.stack: dfcb8000
PC is at 0xfffffffe
LR is at pcf8563_read_block_data+0x80/0x8c
pc : [<fffffffe>] lr : [<c04ea488>] psr: a0080033
sp : dfcb9e38 ip : 00000007 fp : ee9cc640
r10: ee9cc6f4 r9 : 152aad1d r8 : 4284f200
r7 : ef296800 r6 : 00000000 r5 : ffffffff r4 : ffffffff
r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : fffffffb
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none
Control: 18c5387d Table: 1fcbc04a DAC: 00000051
Process kworker/0:2 (pid: 3520, stack limit = 0xdfcb8210)
Stack: (0xdfcb9e38 to 0xdfcba000)
9e20: ffffffff ffffffff
9e40: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
9e60: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff c0132cff
9e80: 0083427e ef6cf400 c0a0dc68 c0a0dc68 4284f200 152aad1d 1f99ec37 0000002d
9ea0: 0000002a 00000000 00000002 00000004 00000076 00000003 00000000 00000000
9ec0: ef296800 c0132cd0 ef00ea00 ef6d8b00 00000000 ef296800 00000000 c017c2d0
9ee0: 5ae90984 00000000 1f99ec37 ef6d2580 ef6d8700 00000000 ef296800 00000000
9f00: ee9cc6f4 00000001 ef296800 c0133cdc c0a02d00 00000000 00000000 ef6d2580
9f20: ef296818 ef6d2598 c0a02d00 00000000 00000000 c0134f4c ef39d300 ef296800
9f40: c0134f18 00000000 ef39d300 ef296800 c0134f18 00000000 00000000 00000000
9f60: 00000000 c01398b8 00000000 00000000 dfcb9f60 ef296800 00000000 00000000
9f80: dfcb9f80 dfcb9f80 00000000 00000000 dfcb9f90 dfcb9f90 dfcb9fac ef39d300
9fa0: c01397cc 00000000 00000000 c0108738 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c04ea488>] (pcf8563_read_block_data) from [<ffffffff>] (0xffffffff)
Code: bad PC value
---[ end trace 5fc36c943949af69 ]---
Unable to handle kernel paging request at virtual address ffffffec
pgd = c0004000
[ffffffec] *pgd=2fffd861, *pte=00000000, *ppte=00000000
Internal error: Oops - BUG: 37 [#2] PREEMPT SMP ARM
Modules linked in: knacs(O) ipv6
CPU: 0 PID: 3520 Comm: kworker/0:2 Tainted: G D O 4.9.0-3-nacs #1
Hardware name: Xilinx Zynq Platform
task: ef381840 task.stack: dfcb8000
PC is at kthread_data+0x4/0xc
LR is at wq_worker_sleeping+0x8/0x9c
pc : [<c013a058>] lr : [<c013533c>] psr: 20080193
sp : dfcb9c10 ip : ef6d2f20 fp : dfcb9c54
r10: c0142c90 r9 : 00000000 r8 : ef381ba8
r7 : c0a03a60 r6 : c0944a00 r5 : ef381840 r4 : ef6d2a00
r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : ef381840
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 18c5387d Table: 1fcbc04a DAC: 00000051
Process kworker/0:2 (pid: 3520, stack limit = 0xdfcb8210)
Stack: (0xdfcb9c10 to 0xdfcba000)
9c00: ef6d2a00 c064adf4 00000000 c0120ae8
9c20: 00000011 ee82d540 c0a02040 c093f280 ef09bea8 dfcb9964 ef051740 ef381840
9c40: ef381b20 dfcb9c60 c07c2b00 00000000 dfcb9c5c c0142c90 00000020 c012240c
9c60: dfcb9c60 dfcb9c60 c07c2b08 0000000b 00000020 c010c3d8 dfcb8210 0000000b
9c80: 00000000 60080113 bf000000 00000004 62000000 50206461 61762043 0065756c
9ca0: 00000000 fffffffe dfcb8000 ee9cc6f4 ee9cc640 c0162078 c07c4700 dfcb9ce4
9cc0: c07fa1e4 c01b7a18 000001ff fffffffe dfcb9de8 80000007 00000000 fffffffe
9ce0: dfcb8000 ee9cc6f4 ee9cc640 c011a8ec 80000007 c0115ff0 00000000 dfcb9d0c
9d00: 53425553 45545359 32693d4d 45440063 45434956 32692b3d 2d353a63 31353030
9d20: f0958000 c0a087b4 00000007 c0115c7c fffffffe dfcb9de8 dfcb8000 ee9cc6f4
9d40: ee9cc640 c010136c 00000001 ef151018 00000002 c0a02d00 dfcb9df8 008240b1
9d60: c0a67220 00000001 ee9cc640 c04eca94 00000010 ee953b10 ef32ec00 00000000
9d80: ef151018 00000000 ee9bf200 00000000 ef296800 4284f200 152aad1d ee9cc6f4
9da0: ee9cc640 c0425ae4 008240b1 dfcb9dbc 00000001 c0425b3c c07fa1e4 c081086c
9dc0: ee923400 dfcb9dd8 00000001 c0425cc4 fffffffe a0080033 ffffffff dfcb9e1c
9de0: 4284f200 c010ce24 fffffffb 00000000 00000000 00000000 ffffffff ffffffff
9e00: 00000000 ef296800 4284f200 152aad1d ee9cc6f4 ee9cc640 00000007 dfcb9e38
9e20: c04ea488 fffffffe a0080033 ffffffff 00000051 00000000 ffffffff ffffffff
9e40: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
9e60: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff c0132cff
9e80: 0083427e ef6cf400 c0a0dc68 c0a0dc68 4284f200 152aad1d 1f99ec37 0000002d
9ea0: 0000002a 00000000 00000002 00000004 00000076 00000003 00000000 00000000
9ec0: ef296800 c0132cd0 ef00ea00 ef6d8b00 00000000 ef296800 00000000 c017c2d0
9ee0: 5ae90984 00000000 1f99ec37 ef6d2580 ef6d8700 00000000 ef296800 00000000
9f00: ee9cc6f4 00000001 ef296800 c0133cdc c0a02d00 00000000 00000000 ef6d2580
9f20: ef296818 ef6d2598 c0a02d00 00000000 00000000 c0134f4c ef39d300 ef296800
9f40: c0134f18 00000000 ef39d300 ef296800 c0134f18 00000000 00000000 00000000
9f60: 00000000 c01398b8 00000000 00000000 dfcb9f60 ef296800 00000000 00000000
9f80: dfcb9f80 dfcb9f80 00000001 00010001 dfcb9f90 dfcb9f90 dfcb9fac ef39d300
9fa0: c01397cc 00000000 00000000 c0108738 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c013a058>] (kthread_data) from [<c013533c>] (wq_worker_sleeping+0x8/0x9c)
[<c013533c>] (wq_worker_sleeping) from [<c064adf4>] (__schedule+0x25c/0x47c)
[<c064adf4>] (__schedule) from [<c0142c90>] (do_task_dead+0x8c/0x90)
[<c0142c90>] (do_task_dead) from [<c012240c>] (do_exit+0x650/0x9d0)
[<c012240c>] (do_exit) from [<c010c3d8>] (die+0x22c/0x448)
[<c010c3d8>] (die) from [<c011a8ec>] (__do_kernel_fault.part.0+0x64/0x74)
[<c011a8ec>] (__do_kernel_fault.part.0) from [<c0115ff0>]
(do_page_fault+0x374/0x388)
[<c0115ff0>] (do_page_fault) from [<c010136c>] (do_PrefetchAbort+0x38/0x9c)
[<c010136c>] (do_PrefetchAbort) from [<c010ce24>] (__pabt_svc+0x64/0xa0)
Exception stack(0xdfcb9de8 to 0xdfcb9e30)
9de0: fffffffb 00000000 00000000 00000000 ffffffff ffffffff
9e00: 00000000 ef296800 4284f200 152aad1d ee9cc6f4 ee9cc640 00000007 dfcb9e38
9e20: c04ea488 fffffffe a0080033 ffffffff
[<c010ce24>] (__pabt_svc) from [<fffffffe>] (0xfffffffe)
Code: e1a00004 e8bd4070 ea02fddc e5903338 (e5130014)
---[ end trace 5fc36c943949af6a ]---
Fixing recursive fault but reboot is needed!
Disassemble of pcf8563_read_block_data:
c04ea408 <pcf8563_read_block_data>:
c04ea408: e92d4030 push {r4, r5, lr}
c04ea40c: e1a05000 mov r5, r0
c04ea410: e1d000b2 ldrh r0, [r0, #2]
c04ea414: e24dd024 sub sp, sp, #36 ; 0x24
c04ea418: e3a04000 mov r4, #0
c04ea41c: e28dc007 add ip, sp, #7
c04ea420: e5cd1007 strb r1, [sp, #7]
c04ea424: e28d1008 add r1, sp, #8
c04ea428: e1cd21b8 strh r2, [sp, #24]
c04ea42c: e3a02002 mov r2, #2
c04ea430: e1cd00b8 strh r0, [sp, #8]
c04ea434: e1cd01b4 strh r0, [sp, #20]
c04ea438: e5950018 ldr r0, [r5, #24]
c04ea43c: e58d301c str r3, [sp, #28]
c04ea440: e3a03001 mov r3, #1
c04ea444: e58d400e str r4, [sp, #14]
c04ea448: e58d400a str r4, [sp, #10]
c04ea44c: e1cd41ba strh r4, [sp, #26]
c04ea450: e1cd30bc strh r3, [sp, #12]
c04ea454: e1cd31b6 strh r3, [sp, #22]
c04ea458: e58dc010 str ip, [sp, #16]
c04ea45c: eb0009db bl c04ecbd0 <i2c_transfer>
c04ea460: e3500002 cmp r0, #2
c04ea464: 01a00004 moveq r0, r4
c04ea468: 1a000001 bne c04ea474 <pcf8563_read_block_data+0x6c>
c04ea46c: e28dd024 add sp, sp, #36 ; 0x24
c04ea470: e8bd8030 pop {r4, r5, pc}
c04ea474: e2850020 add r0, r5, #32
c04ea478: e3001658 movw r1, #1624 ; 0x658
c04ea47c: e59f200c ldr r2, [pc, #12] ; c04ea490
<pcf8563_read_block_data+0x88>
c04ea480: e34c1081 movt r1, #49281 ; 0xc081
c04ea484: ebfcee00 bl c0425c8c <dev_err>
c04ea488: e3e00004 mvn r0, #4
c04ea48c: eafffff6 b c04ea46c <pcf8563_read_block_data+0x64>
c04ea490: c0749088 .word 0xc0749088
Disassemble of dev_err:
c0425c8c <dev_err>:
c0425c8c: e92d000e push {r1, r2, r3}
c0425c90: e1a01000 mov r1, r0
c0425c94: e52de004 push {lr} ; (str lr, [sp, #-4]!)
c0425c98: e24dd010 sub sp, sp, #16
c0425c9c: e28d2008 add r2, sp, #8
c0425ca0: e3020df4 movw r0, #11764 ; 0x2df4
c0425ca4: e59d3014 ldr r3, [sp, #20]
c0425ca8: e34c007d movt r0, #49277 ; 0xc07d
c0425cac: e28dc018 add ip, sp, #24
c0425cb0: e58dc004 str ip, [sp, #4]
c0425cb4: e58d3008 str r3, [sp, #8]
c0425cb8: e28d3004 add r3, sp, #4
c0425cbc: e58d300c str r3, [sp, #12]
c0425cc0: ebffff8b bl c0425af4 <__dev_printk>
c0425cc4: e28dd010 add sp, sp, #16
c0425cc8: e49de004 pop {lr} ; (ldr lr, [sp], #4)
c0425ccc: e28dd00c add sp, sp, #12
c0425cd0: e12fff1e bx lr
Attachment:
config
Description: Binary data