faulty oom-killer on amd64?

From: Michael Renner
Date: Wed Nov 23 2005 - 20:36:51 EST


I'm seeing lack of OOM killing on two of our amd64/x86_64 servers. One is a
single core quad opteron with 32 gb ram, the other one a 8 processor dual core
opteron with 64 gb ram. We've got a perl script (which forks itself a few times)
which tends to eat up a lot of ram, eventually reaching the systems limits. I
bandaided the problem in the meanwhile with ulimits (while we try to figure out
why perl is misbehaving in the first place) but the kernel should also behave
properly when it's allocateable memory gets stretched thin.

On the four processor machine this usually ended up in the box freezing up more
or less (userland stalled, sysrq keys working though. I usually did a sync and
then rebooted, sending terms/kills to all processes worked rarely). A log of
such an event can be seen at http://phpfi.com/88425.

On the 16 processor machine I get CPU lockups with identical traces which can be
seen at http://666kb.com/i/10yov42ydfdog.jpg and
http://666kb.com/i/10yom358azw8w.jpg (this was tested with 2.6.14 and 2.6.15-rc2

http://phpfi.com/88428 is the .config of the 16 processor machine and
http://phpfi.com/88429 of the 4 processor one.

Any ideas?


best regards,
Michael Renner - Network services

Preisvergleich Internet Services AG
Obere DonaustraÃe 63/2, A-1020 Wien
Tel: +43 1 5811609 80
Fax: +43 1 5811609 55

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/