Re: Fix for wrong OOM killer trigger?

From: Andrea Arcangeli
Date: Fri Sep 19 2003 - 14:26:55 EST


Hi Roland,

On Fri, Sep 19, 2003 at 07:16:13PM +0200, Roland Bless wrote:
> Hi Miquel,
>
> I read your e-mail http://www.cs.helsinki.fi/linux/linux-kernel/2003-27/1274.html
> in the archive, but was not able to find a solution.
> We have a similar problem:
> HW: 4x 2,4GHz Xeon, 4GB Ram, 3ware 7000-series ATA-RAID
> SW: Kernel 2.4.22 (also seen on 2.4.21, 2.4.22-ac3), lvm, software raid,
> reiserfs, SuSE 8.1. Swap turned off (see later).
>
> **Symptom: programs that heavily search the whole filesystem
> (e.g., rsync, ssync, TSM backup client dsmc) cause to trigger the
> OOM killer procedure (not very funny if NIS or NFS gets killed).
>
> Sep 17 21:49:07 fs1 kernel: Out of Memory: Killed process 1384 (lmgrd).
> Sep 17 21:49:12 fs1 kernel: Out of Memory: Killed process 1617 (exim).
> Sep 17 21:49:18 fs1 kernel: Out of Memory: Killed process 1402 (ntpd).
> Sep 17 21:49:23 fs1 kernel: Out of Memory: Killed process 1278 (portmap).
> Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2715 (dsmc).
> Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2716 (dsmc).
> Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2717 (dsmc).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1600 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1601 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1603 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1604 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1605 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1606 (nscd).
> Sep 17 21:49:40 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1421 (ypbind).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1422 (ypbind).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1423 (ypbind).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1424 (ypbind).
> Sep 17 21:49:51 fs1 kernel: Out of Memory: Killed process 1584 (atd).
> Sep 17 21:49:57 fs1 kernel: Out of Memory: Killed process 1329 (ypserv).
>
> The OOM kill occured also when the cache memory didn't exhausted the
> available memory (total mem usage was around 1.8GB).
> echo 2>/proc/sys/vm/overcommit_memory did not solve the problem either.
> In my understanding it has to do something with the fs cache/vm. We have some
> files that are larger than 2GB, but usually the killing process starts at
> different points in time.
>
> However, I also saw that kswapd used a lot CPU though swap was not active.
> With swap space activated, the load on the cpu increases dramatically
> so that the system becomes unusable, too. This is our file server and
> I'm currently not able to make a backup to other systems. That's really
> frustrating.
>
> Anyone any ideas? Please Cc: to me in your replies since I'm not on the lkml.

can you try with 2.4.22aa1? the oom killer there will only work on tasks
that are allocating memory, not on idle daemons, so the probability of
killing rsync first should be higher. stock SuSE 8.1 kernel should do
the same too.

Andrea

/*
* If you refuse to depend on closed software for a critical
* part of your business, these links may be useful:
*
* rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.5/
* rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.4/
* http://www.cobite.com/cvsps/
*
* svn://svn.kernel.org/linux-2.6/trunk
* svn://svn.kernel.org/linux-2.4/trunk
*/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/