2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?

From: Bruno PrÃmont
Date: Sun Apr 24 2011 - 14:22:21 EST


On an older system I've been running Gentoo's revdep-rebuild to check
for system linking/*.la consistency and after doing most of the work the
system starved more or less, just complaining about stuck tasks now and
then.
Memory usage graph as seen from userspace showed sudden quick increase of
memory usage though only a very few MB were swapped out (c.f. attached RRD
graph).


Sample soft-lockup trace:
[ 3683.000651] BUG: soft lockup - CPU#0 stuck for 64s! [kswapd0:365]
[ 3683.000720] Modules linked in: squashfs zlib_inflate nfs lockd nfs_acl sunrpc snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer pcspkr snd snd_page_alloc
[ 3683.001331] Modules linked in: squashfs zlib_inflate nfs lockd nfs_acl sunrpc snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer pcspkr snd snd_page_alloc
[ 3683.001941]
[ 3683.001988] Pid: 365, comm: kswapd0 Tainted: G W 2.6.39-rc4-jupiter-00187-g686c4cb #1 NVIDIA Corporation. nFORCE-MCP/MS-6373
[ 3683.002163] EIP: 0060:[<c1092cd0>] EFLAGS: 00000246 CPU: 0
[ 3683.002221] EIP is at shrink_inactive_list+0x110/0x340
[ 3683.002272] EAX: 00000000 EBX: c1546620 ECX: 00000001 EDX: 00000000
[ 3683.002324] ESI: dd587f78 EDI: dd587e68 EBP: dd587e88 ESP: dd587e38
[ 3683.002376] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 3683.002427] Process kswapd0 (pid: 365, ti=dd586000 task=dd532700 task.ti=dd586000)
[ 3683.002486] Stack:
[ 3683.002532] dd587e78 00000001 00000000 00000001 00000054 dd532700 c15468d8 c154688c
[ 3683.002871] c15468dc 00000000 dd587e5c 00000000 dd587e68 dd587e68 dd587e6c 0000007b
[ 3683.003212] 00000000 00000002 00000000 00000000 dd587f14 c10932a0 00000002 00000001
[ 3683.003548] Call Trace:
[ 3683.003602] [<c10932a0>] shrink_zone+0x3a0/0x500
[ 3683.003653] [<c1093b4e>] kswapd+0x74e/0x8d0
[ 3683.003705] [<c1093400>] ? shrink_zone+0x500/0x500
[ 3683.003757] [<c1047794>] kthread+0x74/0x80
[ 3683.003807] [<c1047720>] ? kthreadd+0xb0/0xb0
[ 3683.003864] [<c13918b6>] kernel_thread_helper+0x6/0xd
[ 3683.003913] Code: 0e 04 74 5f 89 da 2b 93 e0 02 00 00 c1 fa 02 69 d2 95 fa 53 ea 01 04 95 e8 55 52 c1 8b 55 d4 85 d2 75 5f fb c7 45 dc 00 00 00 00 <8b> 45 dc 83 c4 44 5b 5e 5f c9 c3 90 8d 74 26 00 83 7d 08 09 0f
[ 3683.006264] Call Trace:
[ 3683.006312] [<c10932a0>] shrink_zone+0x3a0/0x500
[ 3683.006379] [<c1093b4e>] kswapd+0x74e/0x8d0
[ 3683.006430] [<c1093400>] ? shrink_zone+0x500/0x500
[ 3683.006480] [<c1047794>] kthread+0x74/0x80
[ 3683.006529] [<c1047720>] ? kthreadd+0xb0/0xb0
[ 3683.006580] [<c13918b6>] kernel_thread_helper+0x6/0xd



A meminfo (sysreq + m) in between the traces produced the following weird
looking information:
[ 6605.162375] Mem-Info:
[ 6605.162421] DMA per-cpu:
[ 6605.162468] CPU 0: hi: 0, btch: 1 usd: 0
[ 6605.162517] Normal per-cpu:
[ 6605.162565] CPU 0: hi: 186, btch: 31 usd: 56
[ 6605.162618] active_anon:56 inactive_anon:86 isolated_anon:32
[ 6605.162620] active_file:1114 inactive_file:1101 isolated_file:32
[ 6605.162623] unevictable:8 dirty:0 writeback:0 unstable:0
[ 6605.162624] free:1195 slab_reclaimable:14698 slab_unreclaimable:34339
[ 6605.162626] mapped:135 shmem:0 pagetables:87 bounce:0
[ 6605.162875] DMA free:1936kB min:88kB low:108kB high:132kB active_anon:4kB inactive_anon:4294967292kB active_file:4294967284kB inac
tive_file:4294967252kB unevictable:0kB isolated(anon):4kB isolated(file):56kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB map
ped:0kB shmem:0kB slab_reclaimable:180kB slab_unreclaimable:4688kB kernel_stack:9104kB pagetables:0kB unstable:0kB bounce:0kB writeba
ck_tmp:0kB pages_scanned:2 all_unreclaimable? no
[ 6605.163011] lowmem_reserve[]: 0 460 460
[ 6605.163207] Normal free:2836kB min:2696kB low:3368kB high:4044kB active_anon:220kB inactive_anon:348kB active_file:4468kB inactive
_file:4448kB unevictable:32kB isolated(anon):124kB isolated(file):72kB present:471360kB mlocked:32kB dirty:0kB writeback:0kB mapped:5
40kB shmem:0kB slab_reclaimable:58612kB slab_unreclaimable:132676kB kernel_stack:256640kB pagetables:348kB unstable:0kB bounce:0kB wr
iteback_tmp:0kB pages_scanned:871 all_unreclaimable? no
[ 6605.163337] lowmem_reserve[]: 0 0 0
[ 6605.163526] DMA: 96*4kB 178*8kB 8*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1936kB
[ 6605.164012] Normal: 539*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2836kB
[ 6605.164500] 2360 total pagecache pages
[ 6605.164547] 113 pages in swap cache
[ 6605.164595] Swap cache stats: add 135546, delete 135433, find 121976/158215
[ 6605.164647] Free swap = 518472kB
[ 6605.164694] Total swap = 524284kB
[ 6605.170001] 122848 pages RAM
[ 6605.170001] 2699 pages reserved
[ 6605.170001] 2129 pages shared
[ 6605.170001] 83744 pages non-shared


dmesg output (stripped part of the soft-lockup traces to keep log
small) and config attached.

That state suddenly "fixed" itself when revdep-rebuild ended up in
STOPPED state (lots of minutes after hitting CTRL+Z on its terminal),
though system didn't survive very long after that (died in network
stack handling RX IRQ for IPv6 packet.

System has mostly reiserfs filesystems mounted (one ext3 but that one
should not have been touched).


Previously I've been running kernels up to 2.6.38 without seeing this
kind of issue. This was first run of 2.6.39-rc kernel on this system.

If there is some more info I can provide, please tell!
Bruno

Attachment: vm-madness.config
Description: Binary data

Attachment: vm-madness.dmesg
Description: Binary data

Attachment: memory-exhausted.png
Description: PNG image