help debugging lock up on 2.[34].x?

From: Pete Toscano (sigsegv@psinet.com)
Date: Mon Jun 19 2000 - 09:25:37 EST


hello,

i've been experiencing a lock up of my system since around about
2.3.99pre7 and i'm having trouble debugging it. i'm hoping that some of
you might be able to give me tips on how to go about finding out what's
wrong.

first, the problem. from around 2.3.99pre7 to today 2.4.0test1ac20
(can't get ac21 to compile), i've been seeing what seems to be random
lockups of my system. complete, solid locks. i can't even ping the
machine and get a response. the keyboard is dead and none of the
alt-sysrq stuff works (including the one that's supposed to sometimes
help with x lockups). i see this problem in x (3.3.6 with glx,
but without glx too). sometimes, it locks just a few minutes after
booting, sometimes it takes hours. it seems to be related to load, but
not always. it almost always happens after (during?) a disk write and
the cpu fan sound seems to go up in pitch slightly after the lock (maybe
this is just all in my head though =).

when it locks, no dump/oops info is displayed or dumped into the logs.
i'm hoping that something is printed to the console and i just have to
get it to crash when i'm not running x. of course, this could very well
be an x problem though i haven't changed anything x-related for a while
(way before 2.3.99pre7).

what would make my life easier is a way to trigger the crash. i tried
"make -j bzImage" in /usr/src/linux from the console and that sort of
locked up my system, but not the way that x was locked up. i could
switch between virtual consoles and press keys and see them echoed on
the screen, it's just that system responsiveness was near zero. if i
hit ctrl-c in the bzImage vc, i'd eventually (after minutes of waiting)
get some messages listing out files and saying something about (lost?)
interrupts. the command prompt never returns (well, i gave up waiting
after about 1/2 hour). maybe this is the same thing that's happening
with x, though i kind of doubt it. i've been spending a lot of time in
console lately and only when i do the "make -j" do i see this problem.

this is getting quite frustrating. it's gotten to the point where i
can't get much of anything done on my system for everytime i try to work
on something, it soon locks. it's running fine with freebsd 4.0-stable,
so i don't think it's the hardware. the only problem with freebsd is
that it doesn't support all my hardware like linux does and i'm much
more familiar with linux (only even tried freebsd because of all these
locks). i've been trying 2.2.x, but it requires so many patches to get
even some of the functionality (usb, ata66, agpgart) that i'm leary of
trusting that too.

is there any way to debug the machine when x dumps and the kbd is
locked? maybe the serial console?

i'm stuck here, but i'm more than willing to try things if anyone can
suggest something. i really don't want other to see this problem when
2.4 is released.

fwiw, i've got a dual p3-600 system. asus p2b-d mobo. 512m ram.
promise ultra66 board with two ibm 7200rpm ide drives attached to it.
(boot with ide=reverse). matrox g400 max video card. various usb
devices. cdrom. cd-rw. some other stuff too. let me know if any more
details are needed.

thanks,
pete

-- 
Pete Toscano      p:sigsegv@psinet.com        w:ptoscano@netsol.com
GPG fingerprint: D8F5 A087 9A4C 56BB 8F78  B29C 1FF0 1BA7 9008 2736

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:17 EST