Re: Possible boot race (seen on MX35)

From: Andrew Morton
Date: Fri May 08 2009 - 19:25:57 EST


On Fri, 8 May 2009 23:47:18 +0200
Robert Schwebel <r.schwebel@xxxxxxxxxxxxxx> wrote:

> Hi,
>
> While testing 2.6.30-rc4 on i.MX35 (with mxc-master ontop of the vanilla
> -rc4) I have seen the following oops. As it went away by booting the
> board again and didn't show up0 again even after several boots, I assume
> it could be a race coming from the recent fast boot activities? Does
> anyone have an idea?
>
> After the oops, the board continues booting as usual.
>
> rsc
>
> ----------8<----------
>
> Uncompressing Linux.................................................................................................................... done, booting the kernel.
> Linux version 2.6.30-rc4-ptx-mxc1 (jbe__octopus) (gcc version 4.3.2 (OSELAS.Toolchain-1.99.3) ) #1 PREEMPT Fri May 8 22:04:53 CEST 2009
> CPU: ARMv6-compatible processor __4117b363__ revision 3 (ARMv6TEJ), cr=00c5387f
> CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
> Machine: Phytec Phycore pcm043
> Memory policy: ECC disabled, Data cache writeback
> On node 0 totalpages: 32768
> free_area_init_node: node 0, pgdat c038a0f0, node_mem_map c03a4000
> Normal zone: 256 pages used for memmap
> Normal zone: 0 pages reserved
> Normal zone: 32512 pages, LIFO batch:7
> Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512
> Kernel command line: console=ttymxc0,115200 video=mx3fb:Sharp-LQ035Q7 ip=192.168.24.47:192.168.23.2:192.168.23.1:255.255.0.0::: root=/dev/nfs nfsroot=192.168.23.2:/home/jbe/work/bsp/phytec/phyCORE/OSELAS.BSP-phyCORE-trunk/platform-phyCORE-i.MX35/root,v3,tcp mtdparts="physmap-flash.0:256k(uboot)ro,128k(ubootenv),2M(kernel),-(root)"
> NR_IRQS:180
> MXC GPIO hardware
> MXC IRQ initialized
> PID hash table entries: 512 (order: 9, 2048 bytes)
> Console: colour dummy device 80x30
> Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
> Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
> Memory: 128MB = 128MB total
> Memory: 126064KB available (3224K code, 258K data, 108K init, 0K highmem)
> Calibrating delay loop... 398.13 BogoMIPS (lpj=1990656)
> Mount-cache hash table entries: 512
> CPU: Testing write buffer coherency: ok
> net_namespace: 296 bytes
> regulator: core version 0.5
> NET: Registered protocol family 16
> Unable to handle kernel NULL pointer dereference at virtual address 000000e4
> pgd = c0004000
> __000000e4__ *pgd=00000000
> Internal error: Oops: 805 __#1__ PREEMPT
> Modules linked in:
> CPU: 0 Not tainted (2.6.30-rc4-ptx-mxc1 #1)
> PC is at call_usermodehelper_setup+0x44/0x78
> LR is at exit_notify+0x168/0x184
> pc : __<c004aa00>__ lr : __<c003d620>__ psr: 00000013
> sp : c786dff8 ip : 00000000 fp : 00000000
> r10: 00000000 r9 : 00000000 r8 : 00000000
> r7 : 00000000 r6 : 00000000 r5 : 0000003c r4 : 000000cc
> r3 : c003d620 r2 : c004aa00 r1 : c781ca00 r0 : c781ca00
> Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
> Control: 00c5387f Table: 80004008 DAC: 00000017
> Process khelper (pid: 27, stack limit = 0xc786c260)
> Stack: (0xc786dff8 to 0xc786e000)
> dfe0: 00000000 00000000
> __<c004aa00>__ (call_usermodehelper_setup+0x44/0x78) from __<c78c5c40>__ (0xc78c5c40)
> Code: e4823004 e59f3034 e5842008 e584300c (e5846018)
> ---__ end trace 1b75b31a2719ed1c __---
>

Hard.

At a guess I'd say it died somewhere down inside INIT_WORK(), perhaps
doing lockdep stuff. Do you have CONFIG_LOCKDEP=n?

It would help if you could work out which field of struct
subprocess_info is at offset 0x000000e4 in your build.

One way of doing that is

- put this into ~/.gdbinit

define offsetof
set $off = &(((struct $arg0 *)0)->$arg1)
printf "%d 0x%x\n", $off, $off
end

- set CONFIG_DEBUG_INFO=y

- make kernel/kmod.o

- gdb kernel/kmod.o

(gdb) offsetof subprocess_info cred
80 0x50

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/