Re: Oops while running fs_racer test on a POWER6 box againstlatest git
From: Nick Piggin
Date: Thu Jul 01 2010 - 06:59:23 EST
On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote:
> > While running fs_racer test from LTP on a POWER6 box against latest git(2.6.3
> 5-rc3-git4 - commitid 984bc9601f64fd)
> > came across the following warning followed by multiple oops.
> >
> > ------------[ cut here ]------------
> >
> > Badness at kernel/mutex-debug.c:64
> > NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> > REGS: c00000010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotest)
> > MSR: 8000000000029032<EE,ME,CE,IR,DR> CR: 24224422 XER: 00000012
> > TASK = c00000010727cf00[8211] 'fs_racer_file_c' THREAD: c00000010be8bb50 CPU:
> 2
> > GPR00: 0000000000000000 c00000010be8f970 c000000000d3d798 0000000000000001
> > GPR04: c00000010be8fa70 c00000010be8c000 c00000010727d9f8 0000000000000000
> > GPR08: c0000000043042f0 c0000000016534e8 000000000000017a c000000000c29a1c
> > GPR12: 0000000028228424 c00000000f600500 c00000010be8fc40 0000000020000000
> > GPR16: fffffffffffff000 c000000109c73000 c00000010be8fc30 0000000000010442
> > GPR20: 0000000000000000 0000000000000000 00000000000001b6 c00000010dd12250
> > GPR24: c00000000017c08c c00000010727cf00 c00000010dd12278 c00000010dd12210
> > GPR28: 0000000000000001 c00000010be8c000 c000000000ca2008 c00000010be8fa70
> > NIP [c0000000000be9e8] .mutex_remove_waiter+0xa4/0x130
> > LR [c0000000000be9cc] .mutex_remove_waiter+0x88/0x130
> > Call Trace:
> > [c00000010be8f970] [c00000010be8fa00] 0xc00000010be8fa00 (unreliable)
> > [c00000010be8fa00] [c00000000064a9f0] .mutex_lock_nested+0x384/0x430
> > Instruction dump:
> > e81f0010 e93d0000 7fa04800 41fe0028 482e96e5 60000000 2fa30000 419e0018
> > e93e8008 80090000 2f800000 409e0008<0fe00000> e93e8000 80090000 2f800000
> > Unable to handle kernel paging request for unknown fault
> > Faulting instruction address: 0xc00000000008d0f4
> > Oops: Kernel access of bad area, sig: 7 [#1]
> > SMP NR_CPUS=1024 NUMA
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > pSeries
> > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
> > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> > NIP: c00000000008d0f4 LR: c00000000008d0d0 CTR: 0000000000000000
> > REGS: c00000010978f900 TRAP: 0600 Tainted: G W (2.6.35-rc3-git4-a
> utotest)
> > MSR: 8000000000009032
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > EE,ME,IR,DR> CR: 24022442 XER: 00000012
> > DAR: c000000000648f54, DSISR: 0000000040010000
> > TASK = c0000001096e4900[7353] 'fs_racer_file_s' THREAD: c00000010978c000 CPU:
> 10
> > GPR00: 0000000000004000 c00000010978fb80 c000000000d3d798 0000000000000001
> > GPR04: c00000000083539e c000000001610228 0000000000000000 c0000000054c6880
> > GPR08: 00000000000006a5 c000000000648f54 0000000000000007 00000000049b0000
> > GPR12: 0000000000000000 c00000000f601900 00000000ffffffff ffffffffffffffff
> > GPR16: 000000004b7dc520 0000000000000000 0000000000000000 c00000010978fea0
> > GPR20: 00000fffcca7e7a0 00000fffcca7e7a0 00000fffabf7dfd0 00000fffabf7dfd0
> > GPR24: 0000000000000000 0000000001200011 c000000000e1c0a8 c000000000648ed4
> > GPR28: 0000000000000000 c0000001096e4900 c000000000ca0458 c00000010725d400
> > NIP [c00000000008d0f4] .copy_process+0x310/0xf40
> > LR [c00000000008d0d0] .copy_process+0x2ec/0xf40
> > Call Trace:
> > [c00000010978fb80] [c00000000008d0d0] .copy_process+0x2ec/0xf40 (unreliable)
> > [c00000010978fc80] [c00000000008deb4] .do_fork+0x190/0x3cc
> > [c00000010978fdc0] [c000000000011ef4] .sys_clone+0x58/0x70
> > [c00000010978fe30] [c0000000000087f0] .ppc_clone+0x8/0xc
> > Instruction dump:
> > 419e0010 7fe3fb78 480774cd 60000000 801f0014 e93f0008 7800b842 39290080
> > 78004800 60000042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff4
> >
> > Kernel version 2.6.34-rc3-git3 works fine.
>
> Should this read 2.6.35-rc3-git3?
>
> If so, there's only about 20 commits in:
> 5904b3b81d2516..984bc9601f64fd
>
> The likely fs related candidates are from Christoph and Nick Piggin
> (added to CC)
>
> No commits relating to POWER6 or PPC.
Not sure what's happening here. The first warning looks like some mutex
corruption, but it doesn't have a stack trace (these are 2 seperate
dumps, right? ie. the copy_process stack doesn't relate to the mutex
warning?) So I don't have much idea.
If it is reproducable, can you try getting a better stack trace, or
better yet, even bisecting if there is just a small window?
Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/