Re: Semi-random 2.2.x hangs (with partial oops)

Nate Riffe (inkblot@geocities.com)
Wed, 4 Aug 1999 00:48:35 -0400 (EDT)


On Tue, 3 Aug 1999, Nicholas R LeRoy wrote:

> Hello, all...
>
> I've been experiencing semi-random 2.2.x hangs for quite some time now,
> and have been unable to resolve the situation, so I thought I'd ask
> for some help..
>
> I'm running SUSE 6.1: I've tried the default 2.2.7 kernel, 2.2.10,
> 2.2.10-ac12, 2.2.11-pre2, and I'm testing 2.2.11-pre4 at home right
> now. -pre4 has hung on me yet (just built it this morning before
> work), but I suspect that it will. All hangs exibit the following
> behaviour which has me stumped. During 'normal' use, they behave
> wonderfully. When I leave X idle, and xscreensaver kicks in, however,
> is when I experience the hang. Usually within several hours of being
> idle, sometimes up to several days. Every time that I've been able to
> trap it, the screen hack running 'halo -root'. I have reduced my hack
> set down to a small set so that I can reproduce it more readily, but
> it's still a somewhat wait-and-see process...

Me too. X is invariably running (not necessarily with a screensaver) when
it happens. There are oopsen, but they show up on the VT, not much good
if you can't switch out of X. I've gotten some of them by switching out
of X and waiting for them to occur. I've written some of them down and it
looks like mad recursion through these functions:

stext_lock
do_page_fault
stext_lock
error_code
del_timer
do_exit
die_if_no_fixup
do_generic_file_read
stext_lock
error_code
del_timer
die
die_if_no_fixup
do_generic_file_read
stext_lock ....(back to top at do_page_fault)

Code: 89 02 85 c0 74 03 89 50 04 b8 01 00 00 00 eb 03 90 31 c0 c7

I've also noticed that this tends to happens when the loadavg is high
(i.e. 04:00 when the daily crontabs run). It kills the interrupt handler,
attempts to kill the idle task, and doesn't sync (according to the OOPS).
Unfortunately the OOPS scrolls up, up, and away with no way to go back and
see the registers, et al. The kernel is 2.2.7 patched with devfs. I can
reproduce the problem if anyone needs more info.

-Nate

>
> When it hangs, it hangs tight. ALT-SysReq-xxx do nothing, caps lock
> & num-lock are dead, etc. The built it screen-blanking has been kicked
> in, and I can't un-blank, so I don't know for certain that there's
> nothing on the screen. I can't find a way to disable this behaviour
> easily... Is there way to disable the screen blanking, so that I can
> see if there's an oops report?

Yes there is an oops, when you leave the machine, switch it to a VT and
you'll have your oops when you come back instead of a frozen X display.
If you're interrupt handler's being killed too, then the tight hang is to
be expected. Nothing is enabled when the kernel is in granite mode :)

>
> The same is stable as hell running RedHat 4.2 / 2.0.36, so I really
> don't think that its a HW problem.
>
> Yeah, I know that I could disable the problematic screen hack, but
> getting the kernel fixed is the right thing to do. How can I help in
> this endeavor? Any help at all would be greatly appreciated. Even a
> RTFM would be great if you can point me at an FM that I haven't
> already RTFM'ed.
>
> Thanks in advance
>
> - Nick
>
> --
> +-------------------------------+--------------------------------------------+
> | /`--_ Nicholas R LeRoy | In a world without fences, Who needs Gates?|
> |{ }/ Norland Corporation | ---- Experience Linux! ---- |
> | \ * / W6340 Hackbarth Rd | http://www.linux.org | http://www.ssc.com |
> | |___| Fort Atkinson, WI 53538 +--------------------------------------------+
> | nick.leroy@norland.com | #include <disclaimer.h> |
> |http://www3.norland.com/~nleroy| These are my own ideas, not my employer's. |
> +----------------------------------------------------------------------------+
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>

------------------------------------------------((\))<----------------------
Nate Riffe Duct tape by any other name is just as
inkblot@geocities.com sticky.

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS/IT/M/S/O d-@ s-:- a--->- C++ UL++++BS++>$ P+>+++ L+++>+++++$ W+ N !o
K- w(---)$>-- M-(--) V(--) PS+ PE Y+ PGP>++ t(+)@ 5 X@ R tv>! b+>+++ DI++
D e>++(+++) h r++ y?
------END GEEK CODE BLOCK------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/