PROBLEM: Full swap breaks linux kernel 2.2.1 when signals are

mkwasigr@INTERCOPE.COM
Sat, 13 Feb 1999 21:26:59 +0100


Hello there,

I think I found a problem in the 2.2.1 kernel and here's my problem
report...

[1.] One line summary of the problem:
Full swap breaks linux kernel 2.2.1 when signals are intercepted.

[2.] Full description of the problem/report:
If a user program catching signals allocates so much memory and
accessing this memory to get it committed so that the swap runs
out of space, it seems that "sometimes" init gets "damaged"
leaving some processes (e.g. mingetty) as zombies and the system
cannot be shut down using '/sbin/shutdown' et al or with ctrl-alt-del.
It is also not possible to log in anymore. Sometimes I loose kernel
deamons (e.g. klogd).

I have included a simple C-source that shows this.

Interesting enough, this only happens with linux kernel 2.2.1
BUT NOT FOR 2.0.36 OR 2.1.132 !!!
(I've not checked pre-2.2.0 or 2.2.0, sorry.)

2.0.36 and 2.1.132, however, struggle for about a minute during
which the machine is dead (although you may switch the virtual
consoles) but they recover by killing the process that allocated
the large amount of memory. Sometimes other processes (e.g. top)
are killed (I assume due to lack of memory) even if they are owned
by other users or root. Maybe this also can be avoided/fixed...?

Another thing that's interesting is that under 2.0.36 and 2.1.132
kernels the process gets sig 7 (SIGBUS) but with kernel 2.2.1 it
is just killed (no signal).

I have tried SMP and non-SMP kernels (I have a dual-P133 machine)
but that makes no difference.

[3.] Keywords (i.e., modules, networking, kernel):
kernel, virtual memory, swap, signals

[4.] Kernel version (from /proc/version):
----------------- Cut here -----------------
Linux version 2.2.1 (root@MichaKwasi) (gcc version 2.7.2.3) #2 SMP Sat Feb
6 13:33:03 MET 1999
----------------- Cut here -----------------

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)
- No Oopses.
- Sometimes I get 'INIT: PANIC: segmentation violation! giving up...',
see below.

[6.] A small shell script or example program which triggers the
problem (if possible)
----------------- Cut here -----------------
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <ctype.h>
#include <errno.h>

#define MAX_NSEG 1024
#define NSEG_DEF 100
#define SIZE_DEF (100*1024*1024)

void my_sig( int sig )
{
fprintf( stderr, "\ngot sig %d, exiting...\n", sig );
exit( sig );
}

void Usage( char **av )
{
printf( "usage: %s [-S] [<number> [<size>]]\n", av[0] );
printf( " -S: catch all signals\n" );
printf( " <number>: default = %d, max = %d\n",
NSEG_DEF, MAX_NSEG );
printf( " <size>: default = %d (= %d K = %d M)\n",
SIZE_DEF, SIZE_DEF / 1024, (SIZE_DEF / 1024) / 1024 );
printf( "You may append a 'k' or an 'm' to <size> for kilos or
megs.\n" );
}

int main( int ac, char **av )
{
unsigned long nseg = NSEG_DEF;
unsigned long size = SIZE_DEF;
char *ptr[MAX_NSEG];
char *cp;
struct sigaction act;
int rc;
int catch = 0;
int i = 1;

if ( 1 < ac && ! strcmp( av[1], "-?" ) )
{
Usage( av );
return -1;
}

if ( 1 < ac && ! strcmp( av[1], "-S" ) )
{
catch = 1;
i++;
}

if ( i < ac )
{
nseg = strtoul( av[i], & cp, 0 );

if ( ! nseg || cp && *cp )
{
fprintf( stderr, "illegal value '%s' for <number>\n", av[i] );
Usage( av );
return -1;
}

i++;
}

if ( MAX_NSEG < nseg )
{
printf( "Too much segments, %d max\n", MAX_NSEG );
fflush( stdout );
Usage( av );
return -1;
}

if ( i < ac )
{
size = strtoul( av[i], & cp, 0 );

if ( ! size )
{
fprintf( stderr, "illegal value '%s' for <size>\n", av[i] );
Usage( av );
return -1;
}

if ( cp && *cp )
{
switch( toupper (*cp) ) {
case 'K':
size *= 1024;
break;
case 'M':
size *= 1024*1024;
break;
default:
fprintf( stderr, "illegal unit '%c' for <size>\n", *cp );
Usage( av );
return -1;
}
}
}

printf( "%scatching signals, %ld segments with %ld bytes\n",
catch ? "" : "not ",
nseg, size );

if ( catch ) {
act.sa_handler = my_sig;
sigemptyset( & act.sa_mask );

for ( i = 1; i < NSIG; i++ )
{
if ( SIGKILL != i && SIGSTOP != i ) /* these can't be caught */
{
rc = sigaction( i, & act, NULL );
if ( rc ) {
fprintf( stderr,
"WARNING: can't set sighandler for sig %d,
errno=%d '%s'\n",
i, errno, strerror( errno ) );
}
}
}
}

for ( i = 0; i < nseg; i++ )
{
ptr[i] = malloc( size );

if ( ! ptr[i] )
{
printf( "ptr[%d] -- malloc (%d) failed\n", i, size );
fflush( stdout );
}
}

for ( i = 0; i < nseg; i++ )
{
printf( "Segment %d...", i );
fflush( stdout );

cp = ptr[i];

if ( ! cp )
{
printf( "skipping\n" );
}
else
{
memset( cp, 'E', size );
printf( "ok\n" );
}

fflush( stdout );
}

return 0;
}
----------------- Cut here -----------------
- You may want to call it swap.c
- Compile it using 'cc -o swap swap.c' (as simple as that).
- I usually run it on one virtual terminal (no X to save mem) with
an unpriviliged userid and on another vt I have root running top
to monitor what's going on.
- Depending on your mem/swap available, you may want to play with
the options, but it should be sufficient to call the program with
'swap -S' so that it cathes signals and allocates 100 memory
segments
with a size of 100m for a total of 10000m...

BEWARE: Sync your disks before executing it and be prepared to push
the reset button on your machine...

[7.] Environment
----------------- Cut here -----------------
PWD=/mnt/lx221.bug
PAGER=less
HZ=100
HOSTNAME=MichaKwasi
LS_OPTIONS=-a -N --color=tty -T 0
ignoreeof=0
POVRAYOPT=-l/usr/lib/povray/include
QTDIR=/usr/lib/qt
OPENWINHOME=/usr/openwin
LESSKEY=/etc/lesskey.bin
LESSOPEN=|lesspipe.sh %s
MANPATH=/usr/local/man:/usr/man:/usr/X11R6/man:/usr/openwin/man
PS1=\h:\w #
PS2=>
NNTPSERVER=news
KDEDIR=/opt/kde
LESS=-M
USER=root
LS_COLORS=no=00:fi=00:di=01;34:ln=01:pi=40;33:so=01;35:bd=40;33;01:cd=40;33
;01:ex=01;31:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.
tar=00;31:*.tgz=00;31:*.rpm=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip
=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*
.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.png=01;35:
HISTCONTROL=ignoredups
MACHTYPE=i586-pc-linux-gnu
XKEYSYMDB=/usr/X11R6/lib/X11/XKeysymDB
MAIL=/var/spool/mail/root
COLORTERM=1
INFOPATH=/usr/local/info:/usr/info
LOGNAME=root
SHLVL=1
TEXINPUTS=:~/.TeX:/usr/doc/.TeX:/usr/doc/.TeX:
LC_CTYPE=de_DE
HUSHLOGIN=FALSE
MINICOM=-c on
INFODIR=/usr/local/info:/usr/info
SHELL=/bin/bash
PRINTER=lp
HOSTTYPE=i586
OSTYPE=linux-gnu
WINDOWMANAGER=/usr/X11R6/bin/kde
TERM=linux
HOME=/root
XNLSPATH=/usr/X11R6/lib/X11/nls
no_proxy=localhost
PATH=/sbin:/usr/sbin:/root/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:
/usr/openwin/bin:/usr/lib/java/bin:/var/lib/dosemu:/usr/games/bin:/usr/game
s:/opt/kde/bin
LESSCHARSET=latin1
FROM_HEADER=MichaKwasi.Home
_=/usr/bin/env
----------------- Cut here -----------------

[7.1.] Software (add the output of the ver_linux script here)
----------------- Cut here -----------------
-- Versions installed: (if some fields are empty or looks
-- unusual then possibly you have very old versions)
Linux MichaKwasi 2.2.1 #2 SMP Sat Feb 6 13:33:03 MET 1999 i586 unknown
Kernel modules 2.1.121
Gnu C 2.7.2.3
Binutils 2.9.1.0.15
Linux C Library x 1 root root 2478585 Dec 14 21:23
/lib/libc.so.6
Dynamic linker ldd (GNU libc) 2.0.7
Procps 1.2.9
Mount 2.9
Net-tools (1998-06-29)
Kbd 0.96
Sh-utils 1.12
----------------- Cut here -----------------

[7.2.] Processor information (from /proc/cpuinfo):
----------------- Cut here -----------------
processor : 0
vendor_id : GenuineIntel
cpu family : 5
model : 2
model name : Pentium 75 - 200
stepping : 12
cpu MHz : 132.949884
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : yes
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 apic
bogomips : 53.04

processor : 1
vendor_id : GenuineIntel
cpu family : 5
model : 2
model name : Pentium 75 - 200
stepping : 12
cpu MHz : 132.949884
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : yes
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 apic
bogomips : 53.04

----------------- Cut here -----------------

[7.3.] Module information (from /proc/modules):
----------------- Cut here -----------------
nls_cp437 3548 1 (autoclean)
dummy0 720 1 (autoclean)
----------------- Cut here -----------------

[7.4.] SCSI information (from /proc/scsi/scsi)
----------------- Cut here -----------------
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: DORS-32160 Rev: WA6A
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: DCAS-34330 Rev: S65A
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: PIONEER Model: CD-ROM DR-U12X Rev: 1.06
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 05 Lun: 00
Vendor: IOMEGA Model: ZIP 100 Rev: J.02
Type: Direct-Access ANSI SCSI revision: 02
----------------- Cut here -----------------

[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):
'/proc/sys/vm/overcommit_memory' was set to 0 for 2.1.132 and 2.2.1.
Setting it to 1 is not a good idea at all, I suppose...

[X.] Other notes, patches, fixes, workarounds:
I have found no workaround for this and I don't know how to find
the piece of code that is responsible for all that (no time...).

The problem is easy to reproduce, thought not always... I sometimes
need to run the program twice... here's a sample "session":

----------------- Cut here -----------------
Initial processes/memory:
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME
COMMAND
100 0 1 0 0 0 264 172 do_select S ? 0:05 init

40 0 2 1 0 0 0 0 bdflush SW ? 0:00
(kflushd)
840 0 3 1 0 0 0 0 kswapd SW ? 0:00
(kswapd)
140 0 10 1 0 0 764 240 nanosleep S ? 0:00
update (b
140 0 143 1 0 0 948 536 do_select S ? 0:00
/usr/sbin
140 0 147 1 0 0 1036 628 do_syslog S ? 0:00
/usr/sbin
100 0 231 1 15 0 1964 1300 wait4 S 1 0:00
-bash
100 0 232 1 0 0 768 340 read_chan S 2 0:00
/sbin/min
100 0 233 1 0 0 768 340 read_chan S 3 0:00
/sbin/min
100 0 234 1 0 0 768 340 read_chan S 4 0:00
/sbin/min
100 0 235 1 0 0 768 340 read_chan S 5 0:00
/sbin/min
100 0 236 1 0 0 768 340 read_chan S 6 0:00
/sbin/min
40 0 342 1 11 0 792 356 do_select S ? 0:00
/usr/bin/
100000 0 344 231 17 0 972 536 R 1 0:00 ps
alx
total used free shared buffers cached
Mem: 95812 39732 56080 4060 2336 32624
-/+ buffers/cache: 4772 91040
Swap: 88352 0 88352

running: 'swap -S'
message: (none)
lost: klogd (pid 147)
processes/memory:
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME
COMMAND
100 0 1 0 1 0 264 60 do_select S ? 0:05 init

40 0 2 1 0 0 0 0 bdflush SW ? 0:00
(kflushd)
840 0 3 1 5 0 0 0 kswapd SW ? 0:03
(kswapd)
140 0 10 1 0 0 764 24 nanosleep S ? 0:00
update (b
140 0 143 1 2 0 948 76 do_select S ? 0:00
/usr/sbin
100 0 231 1 10 0 1964 416 wait4 S 1 0:00
-bash
100 500 232 1 1 0 1972 508 read_chan S 2 0:00
-bash
100 0 233 1 0 0 768 0 read_chan SW 3 0:00
(mingetty
100 0 234 1 0 0 768 0 read_chan SW 4 0:00
(mingetty
100 0 235 1 0 0 768 0 read_chan SW 5 0:00
(mingetty
100 0 236 1 0 0 768 0 read_chan SW 6 0:00
(mingetty
40 0 342 1 0 0 792 0 do_select SW ? 0:00
(gpm)
0 0 353 231 12 0 972 536 R 1 0:00 ps
alx
total used free shared buffers cached
Mem: 95812 7144 88668 1116 1972 2536
-/+ buffers/cache: 2636 93176
Swap: 88352 964 87388

running: 'swap -S 250 10m'
message: 'INIT: PANIC: segmentation violation! giving up...'
lost: 1(init), 10(update), 143(???)
processes/memory:
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME
COMMAND
100 0 1 0 10 0 276 144 pause S ? 0:21
40 0 2 1 0 0 0 0 bdflush SW ? 0:00
(kflushd)
840 0 3 1 7 0 0 0 kswapd SW ? 0:13
(kswapd)
100 0 231 1 9 0 1964 508 wait4 S 1 0:44
-bash
100 500 232 1 0 0 1972 420 read_chan S 2 0:00
-bash
100 0 233 1 0 0 768 0 read_chan SW 3 0:00
(mingetty
100 0 234 1 0 0 768 0 read_chan SW 4 0:00
(mingetty
100 0 235 1 0 0 768 0 read_chan SW 5 0:00
(mingetty
100 0 236 1 0 0 768 0 read_chan SW 6 0:00
(mingetty
100000 0 386 231 10 0 968 532 R 1 0:00 ps
alx
total used free shared buffers cached
Mem: 95812 6432 89380 1084 1976 1984
-/+ buffers/cache: 2472 93340
Swap: 88352 880 87472
----------------- Cut here -----------------

I think that 2.2 is GREAT, especially for SMP boxes and I hope that
this helps to make it even better.

Best regards...

Michael Kwasigroch (mk@intercope.com)
Hamburg, Germany

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/