LDS: catch it ...

Cristian Gafton (gafton@cccis.sfos.ro)
Fri, 20 Oct 1995 06:12:49 +0200 (EET)


hi there,

This morning (5am here) I had a LDS on one of my servers, the first I
ever had, and that machine had 36 days uptime. I managed to get some info
from that.

So: preliminaries - machine details: Intel DX2/66, ASUS SP3G motherboard,
NCR53c810, 16 Mb RAM, two Cyclades boards on it. Stock linux kernel
1.2.13 (no patches, no nothing). At the time LDS occured, the machine was
absolutely idle - it is acting as a terminal server and nobody was
dialing in, and I'm also sure enough about the time when LDS occured - a
1min at maximum interval. I have a small program that prints on a
console the uptime info at one minute interval. I was watching that
console while taking my cofee - eh, after a looong working night - and
something appeared strange to me when the next line didn't show up at the
expected moment. Tryied to switch consoles; no luck. The system wasn't
responding to me at all. I waited five more minutes. Nothing.
Ctrl-Alt-Del. Nothing. So I think I had a LDS.

Using the ALT+ScrollLock I managed to get this data:

EIP possible values: (this is the order I'v got them:

EIP: 0010:001214EA EFLAGS: 00010006
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:0012FEE4 EFLAGS: 00010206
EIP: 0010:0012FEE4 EFLAGS: 00010206
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
EIP: 0010:001214E6 EFLAGS: 00010017
...and constant from now on......

System.map lookup:

EIP: 001214EA
EIP: 001214E6
---------------------------------------
0012129c T _free_pages
>>>> 0012145c T ___get_free_pages
001215ec T ___get_dma_pages

EIP: 0012FEE4
---------------------------------------
0012fde8 T _permission
>>>> 0012fea8 T _get_write_access
0012ff18 T _put_write_access

If more details are needed and somebody feels like wanting to dig into
this, please ask me. It looks to me as a memory-management problem, maybe
the experts here would like to have a word to say ...

I've applied a sane reboot to this machine and now I'm going home. Home...

Cristian Gafton

| Cristian Gafton, SysAdm gafton@cccis.sfos.ro
| -------------------------------------------------------------------
| Computers & Communications Center str. Moara de Foc nr. 35
| Phone: 40-32-252936, 252938 PO-BOX 2-549
| Fax: 40-32-252933 IASI 6600, ROMANIA
| ===================================================================
| Good code is hard to write, so it must be hard to understand.