Re: kernel ooops... no timeout on find

Jorge Nerin (jnerin@svalero.es)
Mon, 17 May 1999 23:22:03 +0200


>
>
> On Tue, 11 May 1999, Craig Armour wrote:
>
> > Hi,
> >
> > here is a kernel oops I seem to get very regularly. the system is an
> > Asus P2bDS? dual PII with 512meg ram kernel is 2.2.7 . There are no NFS
> > file systems on this box.
> >
> > I think this could be linked to a problem I'm having with phantom loads
> > and sleeping find process running in state D
> >
> >
> > 2:54pm up 6 days, 1:03, 6 users, load average: 1.03, 1.01, 1.00
> > 86 processes: 85 sleeping, 1 running, 0 zombie, 0 stopped
> > CPU states: 1.1% user, 2.7% system, 0.0% nice, 96.2% idle
> >
> > 3554 ? D 0:01 find / ( -fstype nfs -o -fstype NFS -o -type d -regex
> > \(^/tm
> >
> > the problem is occuring in both smp mode and non-smp mode. I'm guessing
> > (becuase I don't know much of how kernels work...) that there is no time
> > out for NFS file systems... in the interim, I'm going to kill updatedb
> > from the cron as I believe this is the parent process. I can't kill the
> > find process.
> >
> > any patches/ideas would be very helpfull... at the moment, I'm
> > struggling to get more than two weeks of uptime out of what should be an
> > enterprise server.
>
>
> Darn... I see what happened, but how... You've got the following nice
> combination:
> a) extra pointer to dentry. OK, that would normally give you
> KERN_CRIT-level printk followed by forced oops.
> b) non-NULL d_parent not equal to dentry in question.
> c) NULL (or otherwise invalid) pointer in parent's d_name.
> That way you got an oops trying to pass arguments to printk.
>
> Sheesh... It may be a random memory corruption from whatever reason, but
> since you have it reproducably I suspect that it's a real dcache corruption.
>
> Could you try the following:
........
>
> ... and look what it will give. inode number and device may at least hint
> where the hell it happens.
> It's not the first report indicating that something is wrong with
> dcache and this one at least may narrow the things down. Please, post the
> resulting log as soon as you'll reproduce the oops with the patched
> variant in place, OK?
> Cheers,
> Al
>
> -
> Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
> To Unsubscribe: send "unsubscribe linux-smp" to majordomo@vger.rutgers.edu

Well in 2.2.5 I also got a pair of oopses and finds in D state:

Apr 10 02:26:49 quartz kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000

Apr 10 02:26:49 quartz kernel: current->tss.cr3 = 023aa000, %cr3 =
023aa000

Apr 10 02:26:49 quartz kernel: *pde = 00000000
Apr 10 02:26:49 quartz kernel: Oops: 0002
Apr 10 02:26:49 quartz kernel: CPU: 1
Apr 10 02:26:50 quartz kernel: EIP: 0010:[8390:ei_open+-36832/96]
Apr 10 02:26:50 quartz kernel: EFLAGS: 00010212
Apr 10 02:26:50 quartz kernel: eax: 00000038 ebx: c1ac1edc ecx:
0000000e edx: 00000038

Apr 10 02:26:50 quartz kernel: esi: c0905440 edi: 00000000 ebp:
00000038 esp: c1ac1e28

Apr 10 02:26:50 quartz kernel: ds: 0018 es: 0018 ss: 0018
Apr 10 02:26:50 quartz kernel: Process find (pid: 5217, process nr:
88, stackpage=c1ac1000)

Apr 10 02:26:50 quartz kernel: Stack: c484abe5 00000000 c0905440
00000038 c1ac1edc c3a4f7fc c484b9c6 c1ac1edc

Apr 10 02:26:50 quartz kernel: c0905440 00000038 c1ac1edc
c3a4f7fc 00000000 c1ac1f1c 00005487 00000003

Apr 10 02:26:50 quartz kernel: c484abcc 000000a8 c484b005
c0905440 c4850fc8 00000004 00000200 c3a4f7fc

Apr 10 02:26:50 quartz kernel: Call Trace: [8390:ei_open+-33895/96]
[8390:ei_open+-30342/96] [8390:ei_open+-33920/96] [8390:ei_open+-
32839/96] [8390:ei_open+-8324/96] [8390:ei_open+-29675/96]
[8390:ei_open+-20961/96]

Apr 10 02:26:50 quartz kernel: [8390:ei_open+-8324/96]
[8390:ei_open+-33920/96] [8390:ei_open+-20840/96] [8390:ei_open+-
38703/96] [d_alloc+24/336] [real_lookup+72/112]
[lookup_dentry+266/440] [__namei+41/92]

Apr 10 02:26:50 quartz kernel: [sys_newlstat+46/148]
[system_call+52/56]

Apr 10 02:26:50 quartz kernel: Code: f3 a5 a8 02 74 02 66 a5 a8 01 74
01 a4 5e 5f c3 57 56 8b 7c

And....

Apr 8 19:11:07 quartz kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000

Apr 8 19:11:07 quartz kernel: current->tss.cr3 = 010f7000, %cr3 =
010f7000

Apr 8 19:11:07 quartz kernel: *pde = 00000000
Apr 8 19:11:07 quartz kernel: Oops: 0002
Apr 8 19:11:07 quartz kernel: CPU: 0
Apr 8 19:11:07 quartz kernel: EIP: 0010:[8390:ei_open+-36832/96]
Apr 8 19:11:07 quartz kernel: EFLAGS: 00010216
Apr 8 19:11:07 quartz kernel: eax: 00000120 ebx: c3fa1edc ecx:
00000048 edx: 00000120

Apr 8 19:11:07 quartz kernel: esi: c0d95400 edi: 00000000 ebp:
00000120 esp: c3fa1e28

Apr 8 19:11:07 quartz kernel: ds: 0018 es: 0018 ss: 0018
Apr 8 19:11:07 quartz kernel: Process find (pid: 2572, process nr:
87, stackpage=c3fa1000)

Apr 8 19:11:07 quartz kernel: Stack: c484abe5 00000000 c0d95400
00000120 c3fa1edc c0b274cc c484b9c6 c3fa1edc

Apr 8 19:11:07 quartz kernel: c0d95400 00000120 c3fa1edc
c0b274cc 00000000 c3fa1f1c 00002eee 00000003

Apr 8 19:11:07 quartz kernel: c484abcc 000000a8 c484b005
c0d95400 c4850fc8 00000004 00000200 c0b274cc

Apr 8 19:11:07 quartz kernel: Call Trace: [8390:ei_open+-33895/96]
[8390:ei_open+-30342/96] [8390:ei_open+-33920/96] [8390:ei_open+-
32839/96] [8390:ei_open+-8324/96] [8390:ei_open+-29675/96]
[8390:ei_open+-20961/96]

Apr 8 19:11:07 quartz kernel: [8390:ei_open+-8324/96]
[8390:ei_open+-33920/96] [8390:ei_open+-20840/96] [8390:ei_open+-
38703/96] [d_alloc+24/336] [real_lookup+72/112]
[lookup_dentry+266/440] [__namei+41/92]

Apr 8 19:11:07 quartz kernel: [sys_newlstat+46/148]
[system_call+52/56]

Apr 8 19:11:07 quartz kernel: Code: f3 a5 a8 02 74 02 66 a5 a8 01 74
01 a4 5e 5f c3 57 56 8b 7c

And the ltrace is...

strrchr("/mnt/winnt/Archivos de programa/"..., '/') = "/GROUND"
fnmatch(0xbffffd0c, 0x080647cf, 4, 0x08064780, 0x08056938) = 1
__errno_location() = 0x400a9484
opendir("GROUND" <unfinished ...>
SYS_stat(0x080647cf, 0xbffff384, 0x400a86cc, 0xbffff3ec, 85) = 0
SYS_open("GROUND", 2048, 027777772310) = 5
SYS_fcntl(5, 2, 1, 0x080647cf, 85) = 0
<... opendir resumed> ) = 0x08056f00
malloc(4096) = 0x08064b88
readdir(0x08056f00 <unfinished ...>
SYS_lseek(5, 0, 1, 0x08064b88, 0x08056f00) = 0
SYS_getdents(5, 0xbfffb680, 15729, 0xbfffb680, 0x08068380

And it stops here....

El alumno no saca ceros; colecciona huevos de colores. (4/10)
no habla; intercambia opiniones
no enfada al profesor; estudia sus reacciones

\_/, _
| @___oo ( Jorge Nerin
/\ /\ / (___,,,}_--~
) /^\) ^\/ _) ~__ jnerin@svalero.es
) /^\/ _) (_
) _ / / _) ( jnerinl@nexo.es
/\ )/\/ || | )_)
< > |(,,) )__)
|| / \)___)\
| \____( )___) )___
\______(_______;;; __;;;

ZU en Zaragoza (Spain)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/