Re: Allocation of too much memory hangs system, kernel 2.2.*

Urban Widmark (urban@svenskatest.se)
Wed, 2 Jun 1999 21:03:00 +0200 (CEST)


(Hmm, replying to myself)

On Wed, 2 Jun 1999, Urban Widmark wrote:

> It isn't a lockup problem. The mail I replied to was about oom() possibly
> killing the init process. I believe I saw this happen with an early 2.2.0,
> I haven't tried provoking it again.
>
> I'm going to try a clean 2.2.9 first, then this patch, then 2.3.4-pre2,
> then pre-2.3.4-2-andrea2 ...

The following makes postgreSQL 6.5beta use up a *lot* of memory (on an
empty database).
select * from ts_syllabus where
(ts_key like '1%' and ts_lang='swe') or
(ts_key like '2%' and ts_lang='swe') or
(ts_key like '3%' and ts_lang='swe') or
(ts_key like '4%' and ts_lang='swe') or
(ts_key like '5%' and ts_lang='swe') or
(ts_key like '6%' and ts_lang='swe') or
(ts_key like '7%' and ts_lang='swe') or
(ts_key like '8%' and ts_lang='swe') or
(ts_key like '9%' and ts_lang='swe')

Results for 2.2.7

Out of memory for klogd.
Out of memory for portmap.
Out of memory for syslogd.
Out of memory for update.
Out of memory for vmstat.
Out of memory for crond.
Out of memory for crond.
Out of memory for postmaster. <- "the memory hog"
Out of memory for sshd.
Out of memory for init. <- !!!
Out of memory for inetd.

Some things did survive:

# ps uxw
USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND
root 1 0.0 0.8 776 276 ? S May 31 0:42
root 2 0.0 0.0 0 0 ? SW May 31 0:00 (kflushd)
root 3 0.0 0.0 0 0 ? SW May 31 0:00 (kpiod)
root 4 0.0 0.0 0 0 ? SW May 31 0:50 (kswapd)
...

So if init was actually killed or not, I don't know. Something is there
but still zombies everywhere, shutdown, reboot, halt, ... nothing happens.

The dying moments of 'vmstat 1' says:
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 1 247892 832 652 1768 0 2232 0 558 163 53 19 52 29
1 0 1 248984 840 652 1544 0 1092 0 273 147 33 8 80 12
1 0 0 249552 804 652 1424 448 908 112 227 154 35 5 70 24
2 0 0 250280 744 652 1968 336 972 128 243 155 38 3 77 20
1 0 0 250052 268 652 1604 836 408 397 102 157 54 3 41 56
1 0 0 251468 880 652 2408 28 1420 7 355 188 41 1 84 15
1 0 0 251872 768 652 1712 616 1108 154 277 203 78 9 65 27
1 0 1 253028 776 652 1176 0 1156 0 289 167 30 9 83 8
1 1 0 254804 824 652 1784 24 1780 30 445 230 46 5 84 10
1 0 1 256140 824 652 1308 68 1344 53 337 191 43 8 83 9
0 1 1 259704 844 652 1956 0 3564 0 891 444 103 6 81 13
1 0 1 261496 824 652 2000 24 1792 6 448 261 52 7 82 11
0 1 0 261472 808 652 1160 1064 812 294 203 200 73 6 59 34
0 1 1 261496 752 652 644 632 620 158 155 178 64 5 60 35
1 0 0 261496 772 652 528 312 104 363 26 160 39 2 73 25
1 0 3 261496 736 652 564 464 32 308 8 283 39 1 87 12
2 1 1 261496 752 652 544 388 12 326 3 148 33 2 77 21
0 3 0 261496 552 652 712 1720 36 1088 9 524 117 0 78 22

where 261496 is the total amount of swap, as reported by 'free'.

2.2.9, 2.2.9-with-andrea-patch, all did the same, except I didn't manage
to hit init (or maybe someone changed something?).

2.3.4-pre2 (yes, I know there is a 2.3.4), ran 3 times.
1. oom messages, but init survived, machine in a workable state
2. machine no longer contactable. Have no idea what happened (nothing in
the logs)
3. vmstat died with a Bus error but no "oom messages", everything is fine.

hpa suggested init should never be killed. It would be nice if not only
init was spared, inetd, sshd and similar may be "critical", depening on
your needs.
(critical in this case is me having to walk to the other end of the
building and pressing a "big red button" :)

pre-2.3.4-2-andrea2 won't boot properly. The machine starts up, it fscks
the disks, it begins starting up daemons ... ping ok, xmeter/rstatd is
saying the machine is up again. But then things stop working, xmeter says
the machine is down again, no telnet/ssh access :(
Nothing in the logs.

The step it fails on is:
su - db2inst1 -c "source sqllib/db2cshrc; db2start; db2jstrt"
(I remember you having db2 problems some time ago, but isn't this a bit
too much ;)

DB2 uses System V IPC, and the processes get kind of big:
USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND
db2inst1 281 0.0 25.2 25780 7836 ? S 20:48 0:00 db2sysc
db2inst1 283 0.0 20.8 25460 6492 ? S 20:48 0:00 db2ipccm
db2inst1 284 0.0 25.1 25780 7820 ? S 20:48 0:00 db2tcpcm
db2inst1 285 0.0 25.1 25780 7824 ? S 20:48 0:00 db2tcpim
db2inst1 286 0.0 20.6 25452 6416 ? S 20:48 0:00 db2resyn
db2inst1 287 0.0 20.6 26224 6428 ? S 20:48 0:00 db2srvlst
db2inst1 289 0.0 8.2 8688 2572 ? S 20:48 0:00 db2jd 6789

pre-2.3.4-2-andrea2 did not apply cleanly to a 2.3.4-pre2 (yes, I did
actually dl a new 2.3.3 tarball just to make sure). There were a lot of
complaints on the isdn code, but since I don't use isdn I don't think
that's what is wrong.

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/