Re: memory & filesystem corruption under heavy load?

Markus Kossmann (mk@emil.inka.de)
Sat, 06 Apr 1996 17:32:04 +0200


I have problems with system crashes under heavy load, too.
My System :
P100
Asus TP4 Board 16 MB FPM-DRAM 512k 15ns Cache (Triton chipset)
NCR 53C810 SCSI with Quantum LPS 1080S (sda ) , XP32150 (sdb) and
Toshiba XM3401TA cdrom
IDE :Connor CP30254 (hda) Seagate ST3660A (hdb)
VGA : ATI Mach 64 4MB VRAM
Linux / = sdb6 /usr = sdb7 /usr/src = sdb8

I first got these problems with 1.3.3? ( I don't remember, at that time
I didn't have internet access). I tracked the problems back to 1.3.18
when 4MB-page-tables were introduced. With the following patch in
include/asm-i386/pgtable.h I disabled them :
*** pgtable.h.old Thu Apr 4 20:09:56 1996
--- pgtable.h Fri Apr 5 13:35:50 1996
***************
*** 7,13 ****
* Define USE_PENTIUM_MM if you want the 4MB page table optimizations.
* This works only on a intel Pentium.
*/
! #define USE_PENTIUM_MM 1

/*
* The Linux memory management assumes a three-level page table setup.
On
--- 7,13 ----
* Define USE_PENTIUM_MM if you want the 4MB page table optimizations.
* This works only on a intel Pentium.
*/
! /* #define USE_PENTIUM_MM 1 */

/*
* The Linux memory management assumes a three-level page table setup.
On
This patch works well for me, with all recent kernels, including 1.3.84

During the last two days I tested an unpatched kernel 1.3.84 and got
these cras
compiling dosemu:
Kernel Panic : VFS : iget with sb = NULL
...
ll_rw_block : Device 08:18 : only 1024-char blocks implemented (1024)

System was totally locked

compiling kernel with make -j:
Aiee : Scheduling in Interrupt

System was totally locked
Another kernel compile with make -j
first the following kernel-oops :

CPU: 0
EIP: 0010:[<00193984>]
EFLAGS: 00010056
eax: 00000002 ebx: 00000005 ecx: 00238300 edx: 402ba298
esi: 00a4b7ff edi: 00000014 ebp: 00000400 esp: 00ddfd80
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process crond (pid: 36, process nr: 7, stackpage=00ddf000)Stack:
0023a118 001e6
0023a118
0023a118 00000001 00000046 00000001 00000046 001e5b00 00825640
0023a118
001f3ca0 0023a298 00000000 00000000 00000002 00ddfdfc 0018eaf2
00000002
Call Trace: [<00192994>] [<0018eaf2>] [<0018eaf2>] [<00192d8a>]
[<00192e12>]
[<00165906>] [<001662b9>]
[<00150817>] [<0011e564>] [<0011eba7>] [<00129e3d>] [<00118fb8>]
[<0010fb7f>] [<0010fa30>] [<0010a4eb>]
Code: 81 7c 9a 04 00 04 00 00 75 3e 8b 84 24 a8 00 00 00 a8 01 74

Using /System.map' to map addresses to symbols.

>>EIP: 193984 <requeue_sd_request+a50/e08>
Trace: 1Trace: 192994 <sd_open+c0/110>
Trace: 18eaf2 <allocate_device+19a/304>
Trace: 18eaf2 <allocate_device+19a/304>
Trace: 192d8a <rw_intr+2e6/324>
Trace: 192e12 <do_sd_request+4a/16c>
Trace: 165906 <add_request+ea/25c>
Trace: 1662b9 <ll_rw_swap_file+191/29c>
Trace: 150817 <get_group_desc+67/74>
Trace: 11e564 <rw_swap_page+268/288>
Trace: 11eba7 <swap_in+b3/ec>
Trace: 129e3d <open_namei+23d/3cc>
Trace: 118fb8 <do_no_page+18c/3c8>
Trace: 10fb7f <do_page_fault+19f/2ac>
Trace: 10fa30 <do_page_fault+50/2ac>
Trace: 10a4eb <error_code+4b/60>

Code: 193984 <requeue_sd_request+a50/e08> cmpl $0x400,0x4(%edx,%ebx,4)
Code: 19398c <requeue_sd_request+a58/e08> jne 1939cc
<requeue_sd_request+a98/e
08>
Code: 19398e <requeue_sd_request+a5a/e08> movl 0xa8(%esp,1),%eax
Code: 193995 <requeue_sd_request+a61/e08> testb $0x1,%al
Code: 193997 <requeue_sd_request+a63/e08> je 193999
<requeue_sd_request+a65/e
08>
Code: 193999 <requeue_sd_request+a65/e08> nop
Code: 19399a <requeue_sd_request+a66/e08> nop
Code: 19399b <requeue_sd_request+a67/e08> nop

then :

ll_rw_block : Device 08:15 : only 4096-char blocks implemented (4096)

and at least (not complete):
...
scsi0: target 1 : sxfer_sanity=0x8, scntl3_sanity
script : Ox78031300 0x0 0x38050800 0x0 0x90080000 0x0 0x0
scsi0: saved data pointer at offset 0
scsi0: can't determine active data pointer offset
0x00234850 (virt 0x00234850) : 0x820b0000 0x00234848 (virt 0x00234848)
0x00234858 (virt 0x00234858) : 0x8f0b0000 0x00213400 (virt 0x00213400)
scsi0: issue queue
scsi0: dsa at phys:0xe2e080 (virt 0xe2e080)
+64 dsa_msgout length : 2311648, data = 0x0(virt 0x00000000)
+60 select_indirect : 0c0000004
+56 dsacmnd : 0x0
+48 dsa_next: 0x0
scsi0: schedule dsa array
scsi0: end schedule dsa array
scsi0: reconnect_dsa head
scsi0: end reconnect_dsa head
mail drew@PoohSticks.ORG
scsi0: nuking commands

system was totally locked again

With an patched 1.3.84 I have run a make -j of 1.3.84 without any crash
( Using 70 MB virtual memory ( 16 phys + 54 swap ) and running time > 3h
;-))

-- 
------------------------------------------------------------------------------
Markus			           <mk@emil.imka.de> (Markus Kossmann)