Re: How to immunize a process from the OOM Killer

From: Juan Miscaro
Date: Fri Oct 30 2009 - 13:27:06 EST


2009/10/29 KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>:
> On Thu, 29 Oct 2009 15:01:22 -0400
> Juan Miscaro <jmiscaro@xxxxxxxxx> wrote:
>
>> Hi, I'm running 2.6.24 on Ubuntu.
>>
>> I've got a OOM Killer gone wild. ÂI have plenty of free memory (over
>> 40 GB) and lots of processes are being murdered. ÂAnyway, I would like
>> to first prevent a few processes from being killed in the hopes of
>> buying me time (users can work) to discover why the killer is being
>> invoked in the first place.
>
> Could you show us your message log of OOM-Killer ?

Hi, by message log I guess you mean parts of the kern.log? I've
attached a snippet of that log file that covers one process being
killed. I have many others like this.

--
/jm
Oct 15 12:36:05 host kernel: [2695651.038485] nxssh invoked oom-killer: gfp_mask=0x44d0, order=2, oomkilladj=0
Oct 15 12:36:05 host kernel: [2695651.038496] Pid: 15247, comm: nxssh Not tainted 2.6.24.6-030909 #1
Oct 15 12:36:05 host kernel: [2695651.038519] [oom_kill_process+0x10a/0x120] oom_kill_process+0x10a/0x120
Oct 15 12:36:05 host kernel: [2695651.038537] [out_of_memory+0x167/0x1a0] out_of_memory+0x167/0x1a0
Oct 15 12:36:05 host kernel: [2695651.038549] [nfs:__alloc_pages+0x34c/0x380] __alloc_pages+0x34c/0x380
Oct 15 12:36:05 host kernel: [2695651.038564] [nfs:__get_free_pages+0x39/0x7b0] __get_free_pages+0x39/0x50
Oct 15 12:36:05 host kernel: [2695651.038573] [tun:__alloc_skb+0x55/0x1250] __alloc_skb+0x55/0x120
Oct 15 12:36:05 host kernel: [2695651.038585] [sock_alloc_send_skb+0x170/0x1c0] sock_alloc_send_skb+0x170/0x1c0
Oct 15 12:36:05 host kernel: [2695651.038598] [unix_stream_sendmsg+0x28f/0x390] unix_stream_sendmsg+0x28f/0x390
Oct 15 12:36:05 host kernel: [2695651.038609] [unix_stream_sendmsg+0x189/0x390] unix_stream_sendmsg+0x189/0x390
Oct 15 12:36:05 host kernel: [2695651.038624] [sock_aio_write+0x109/0x120] sock_aio_write+0x109/0x120
Oct 15 12:36:05 host kernel: [2695651.038635] [ktime_get_ts+0x19/0x50] ktime_get_ts+0x19/0x50
Oct 15 12:36:05 host kernel: [2695651.038650] [tun:do_sync_write+0xd5/0x120] do_sync_write+0xd5/0x120
Oct 15 12:36:05 host kernel: [2695651.038665] [<c0142d70>] autoremove_wake_function+0x0/0x40
Oct 15 12:36:05 host kernel: [2695651.038679] [aa_file_permission+0x3/0xd0] aa_file_permission+0x3/0xd0
Oct 15 12:36:05 host kernel: [2695651.038692] [vfs_write+0x15a/0x170] vfs_write+0x15a/0x170
Oct 15 12:36:05 host kernel: [2695651.038701] [sys_write+0x41/0x70] sys_write+0x41/0x70
Oct 15 12:36:05 host kernel: [2695651.038710] [sysenter_past_esp+0x6b/0xa1] sysenter_past_esp+0x6b/0xa1
Oct 15 12:36:05 host kernel: [2695651.038725] =======================
Oct 15 12:36:05 host kernel: [2695651.038729] Mem-info:
Oct 15 12:36:05 host kernel: [2695651.038733] DMA per-cpu:
Oct 15 12:36:05 host kernel: [2695651.038737] CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038742] CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038746] CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038752] CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038757] CPU 4: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038762] CPU 5: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038767] CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038772] CPU 7: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038777] CPU 8: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038782] CPU 9: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038788] CPU 10: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038792] CPU 11: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038797] CPU 12: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038801] CPU 13: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038806] CPU 14: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038812] CPU 15: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038816] Normal per-cpu:
Oct 15 12:36:05 host kernel: [2695651.038820] CPU 0: Hot: hi: 186, btch: 31 usd: 29 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038825] CPU 1: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038830] CPU 2: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038836] CPU 3: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038841] CPU 4: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038846] CPU 5: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038852] CPU 6: Hot: hi: 186, btch: 31 usd: 30 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038856] CPU 7: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038861] CPU 8: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038865] CPU 9: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038870] CPU 10: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038875] CPU 11: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038881] CPU 12: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038886] CPU 13: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038892] CPU 14: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038897] CPU 15: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038902] HighMem per-cpu:
Oct 15 12:36:05 host kernel: [2695651.038906] CPU 0: Hot: hi: 186, btch: 31 usd: 24 Cold: hi: 62, btch: 15 usd: 11
Oct 15 12:36:05 host kernel: [2695651.038911] CPU 1: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038916] CPU 2: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038922] CPU 3: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038926] CPU 4: Hot: hi: 186, btch: 31 usd: 24 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038932] CPU 5: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038937] CPU 6: Hot: hi: 186, btch: 31 usd: 57 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038941] CPU 7: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038946] CPU 8: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038950] CPU 9: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038955] CPU 10: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038960] CPU 11: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038965] CPU 12: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038970] CPU 13: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038976] CPU 14: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038981] CPU 15: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0
Oct 15 12:36:05 host kernel: [2695651.038987] Active:4175196 inactive:867151 dirty:15 writeback:10 unstable:0
Oct 15 12:36:05 host kernel: [2695651.038988] free:9400621 slab:69457 mapped:182830 pagetables:21155 bounce:0
Oct 15 12:36:05 host kernel: [2695651.038994] DMA free:940kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? no
Oct 15 12:36:05 host kernel: [2695651.038998] lowmem_reserve[]: 0 873 57642 57642
Oct 15 12:36:05 host kernel: [2695651.039010] Normal free:85688kB min:3744kB low:4680kB high:5616kB active:0kB inactive:280kB present:894080kB pages_scanned:0 all_unreclaimable? no
Oct 15 12:36:05 host kernel: [2695651.039014] lowmem_reserve[]: 0 0 454152 454152
Oct 15 12:36:05 host kernel: [2695651.039028] HighMem free:37515856kB min:512kB low:61428kB high:122348kB active:16701060kB inactive:3468324kB present:58131456kB pages_scanned:0 all_unreclaimable? no
Oct 15 12:36:05 host kernel: [2695651.039033] lowmem_reserve[]: 0 0 0 0
Oct 15 12:36:05 host kernel: [2695651.039043] DMA: 1*4kB 1*8kB 2*16kB 4*32kB 4*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 940kB
Oct 15 12:36:05 host kernel: [2695651.039071] Normal: 20427*4kB 409*8kB 4*16kB 21*32kB 7*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86420kB
Oct 15 12:36:05 host kernel: [2695651.039099] HighMem: 31689*4kB 507549*8kB 176346*16kB 57445*32kB 16385*64kB 4522*128kB 1214*256kB 428*512kB 305*1024kB 185*2048kB 6304*4096kB = 37516684kB
Oct 15 12:36:05 host kernel: [2695651.039128] Swap cache: add 0, delete 0, find 0/0, race 0+0
Oct 15 12:36:05 host kernel: [2695651.039132] Free swap = 29294516kB
Oct 15 12:36:05 host kernel: [2695651.039136] Total swap = 29294516kB
Oct 15 12:36:05 host kernel: [2695651.039140] Free swap: 29294516kB
Oct 15 12:36:05 host kernel: [2695651.249865] 14876671 pages of RAM
Oct 15 12:36:05 host kernel: [2695651.249875] 14647295 pages of HIGHMEM
Oct 15 12:36:05 host kernel: [2695651.249876] 314941 reserved pages
Oct 15 12:36:05 host kernel: [2695651.249879] 2921946 pages shared
Oct 15 12:36:05 host kernel: [2695651.249883] 0 pages swap cached
Oct 15 12:36:05 host kernel: [2695651.249885] 20 pages dirty
Oct 15 12:36:05 host kernel: [2695651.249886] 10 pages writeback
Oct 15 12:36:05 host kernel: [2695651.249888] 183270 pages mapped
Oct 15 12:36:05 host kernel: [2695651.249889] 69454 pages slab
Oct 15 12:36:05 host kernel: [2695651.249891] 21174 pages pagetables