Re: Oops while running fs_racer test on a POWER6 box against latest git
From: Michael Neuling
Date: Thu Jul 01 2010 - 21:36:46 EST
In message <20100701105907.GK22976@laptop> you wrote:
> On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote:
> > > While running fs_racer test from LTP on a POWER6 box against latest git(2
.6.3
> > 5-rc3-git4 - commitid 984bc9601f64fd)
> > > came across the following warning followed by multiple oops.
> > >
> > > ------------[ cut here ]------------
> > >
> > > Badness at kernel/mutex-debug.c:64
> > > NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> > > REGS: c00000010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotes
t)
> > > MSR: 8000000000029032<EE,ME,CE,IR,DR> CR: 24224422 XER: 00000012
> > > TASK = c00000010727cf00[8211] 'fs_racer_file_c' THREAD: c00000010be8bb50
CPU:
> > 2
> > > GPR00: 0000000000000000 c00000010be8f970 c000000000d3d798 000000000000000
1
> > > GPR04: c00000010be8fa70 c00000010be8c000 c00000010727d9f8 000000000000000
0
> > > GPR08: c0000000043042f0 c0000000016534e8 000000000000017a c000000000c29a1
c
> > > GPR12: 0000000028228424 c00000000f600500 c00000010be8fc40 000000002000000
0
> > > GPR16: fffffffffffff000 c000000109c73000 c00000010be8fc30 000000000001044
2
> > > GPR20: 0000000000000000 0000000000000000 00000000000001b6 c00000010dd1225
0
> > > GPR24: c00000000017c08c c00000010727cf00 c00000010dd12278 c00000010dd1221
0
> > > GPR28: 0000000000000001 c00000010be8c000 c000000000ca2008 c00000010be8fa7
0
> > > NIP [c0000000000be9e8] .mutex_remove_waiter+0xa4/0x130
> > > LR [c0000000000be9cc] .mutex_remove_waiter+0x88/0x130
> > > Call Trace:
> > > [c00000010be8f970] [c00000010be8fa00] 0xc00000010be8fa00 (unreliable)
> > > [c00000010be8fa00] [c00000000064a9f0] .mutex_lock_nested+0x384/0x430
> > > Instruction dump:
> > > e81f0010 e93d0000 7fa04800 41fe0028 482e96e5 60000000 2fa30000 419e0018
> > > e93e8008 80090000 2f800000 409e0008<0fe00000> e93e8000 80090000 2f80000
0
> > > Unable to handle kernel paging request for unknown fault
> > > Faulting instruction address: 0xc00000000008d0f4
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > SMP NR_CPUS=1024 NUMA
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > pSeries
> > > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_ma
p
> > > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> > > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> > > NIP: c00000000008d0f4 LR: c00000000008d0d0 CTR: 0000000000000000
> > > REGS: c00000010978f900 TRAP: 0600 Tainted: G W (2.6.35-rc3-gi
t4-a
> > utotest)
> > > MSR: 8000000000009032
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > EE,ME,IR,DR> CR: 24022442 XER: 00000012
> > > DAR: c000000000648f54, DSISR: 0000000040010000
> > > TASK = c0000001096e4900[7353] 'fs_racer_file_s' THREAD: c00000010978c000
CPU:
> > 10
> > > GPR00: 0000000000004000 c00000010978fb80 c000000000d3d798 000000000000000
1
> > > GPR04: c00000000083539e c000000001610228 0000000000000000 c0000000054c688
0
> > > GPR08: 00000000000006a5 c000000000648f54 0000000000000007 00000000049b000
0
> > > GPR12: 0000000000000000 c00000000f601900 00000000ffffffff fffffffffffffff
f
> > > GPR16: 000000004b7dc520 0000000000000000 0000000000000000 c00000010978fea
0
> > > GPR20: 00000fffcca7e7a0 00000fffcca7e7a0 00000fffabf7dfd0 00000fffabf7dfd
0
> > > GPR24: 0000000000000000 0000000001200011 c000000000e1c0a8 c000000000648ed
4
> > > GPR28: 0000000000000000 c0000001096e4900 c000000000ca0458 c00000010725d40
0
> > > NIP [c00000000008d0f4] .copy_process+0x310/0xf40
> > > LR [c00000000008d0d0] .copy_process+0x2ec/0xf40
> > > Call Trace:
> > > [c00000010978fb80] [c00000000008d0d0] .copy_process+0x2ec/0xf40 (unreliab
le)
> > > [c00000010978fc80] [c00000000008deb4] .do_fork+0x190/0x3cc
> > > [c00000010978fdc0] [c000000000011ef4] .sys_clone+0x58/0x70
> > > [c00000010978fe30] [c0000000000087f0] .ppc_clone+0x8/0xc
> > > Instruction dump:
> > > 419e0010 7fe3fb78 480774cd 60000000 801f0014 e93f0008 7800b842 39290080
> > > 78004800 60000042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff
4
> > >
> > > Kernel version 2.6.34-rc3-git3 works fine.
> >
> > Should this read 2.6.35-rc3-git3?
> >
> > If so, there's only about 20 commits in:
> > 5904b3b81d2516..984bc9601f64fd
> >
> > The likely fs related candidates are from Christoph and Nick Piggin
> > (added to CC)
> >
> > No commits relating to POWER6 or PPC.
>
> Not sure what's happening here. The first warning looks like some mutex
> corruption, but it doesn't have a stack trace (these are 2 seperate
> dumps, right? ie. the copy_process stack doesn't relate to the mutex
> warning?) So I don't have much idea.
>
> If it is reproducable, can you try getting a better stack trace, or
> better yet, even bisecting if there is just a small window?
I can't reproduce the bug here on POWER6 or POWER7.
Divya, can you bisect this?
Mikey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/