ncpfs lockups

rdm@tad.micro.umn.edu
10 Jun 1996 16:39:34 -0000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Bryn Paul Arnold Jones: "Re: Problem: how to identify source code points for latency profiling"
Previous message: Shinanyaku: "Re: New 2.0 web page"

I've been experiencing occasional random ncpfs lockups for a while, but
nothing substantial enough to put together a reasonable bug report.

However, I just got a lock up on 1.99.14 (leaving me no option but to bring
up my copy of 2.0) that at least had some unusual characteristics.

I was browsing some documents using netscape (netscape was running under
solaris because I only have 16 meg on my linux box). Because solaris
can't access ncpfs directly, I had apache serving up the subdirectories
of interest. Anyways, on one document retrieve netscape took a long time
(more than a fraction of a second -- many seconds more than a fraction of
a second) to bring up a document. And, sure enough, ls on the top
level ncpfs directory hung.

So, I shut down everything and get ready to reboot, but shutdown doesn't
seem to do anything. After several attempts, it still doesn't do anything.

So, I fire up top and start killing things. I kill a -bash with signal 15
and that doesn't do anything. So, I kill it -9 and top hangs. [Maybe it
was my bash? But another top shows that it's a zombie -- seems rather
strange since the -bash wouldn't have had a parent if it was my bash.]

[I'm reconstructing this from memory, and I didn't look very closely at
parent processes, sorry.]

Anyways, I finally get the idea to kill httpd. And, sure enough, the instant
I kill httpd I get a couple lines that look like they came from printk
about some bad return values (255 and -3? These lines never showed up in
any log). System reboots a few seconds later, and 2.0 come up...

Problem: ncpfs hangs intermittently.

Problem: something trashed a queue or process table or something.

Problem: I haven't a clue what I should be doing to better diagnose this.

[This kind of error only happens after running for a while, with a lot of
activity, so enabling kernel debugging seems rather futile. Normally,
however, it doesn't fail so horribly.]

Suggestions anyone?

-- 
Raul

Next message: Bryn Paul Arnold Jones: "Re: Problem: how to identify source code points for latency profiling"
Previous message: Shinanyaku: "Re: New 2.0 web page"