More about freezing 2.1.x (2.1.57)

Eugene Crosser (crosser@average.org)
12 Oct 1997 00:24:25 GMT


--=-=-=__SbUrIiyAsIc+L3wzRmIdBsFLT__=-=-=
Content-Type: text/plain; charset=koi8-r

A while ago I posted here a report about freezing 2.1.any kernels.
I checked it on 2.1.56. Now, I repeated it on 2.1.57 and I hope
that now I have more information.

My environment: 486dx4/120 16M RAM, no ethernet (only loopback,
slip and ppp interfaces). First, strangeness that I noticed before
it freezed. I got a bunch of `inews' processes waiting on read from
socket, with no corresponding `nnrpd' processes (I am running INN
1.4). They did not finish themselves but could be killed.
Second, I got a few kernel messages:
"Oct 8 07:12:14 pccross kernel: Ugh at c0110758"
Address c0110758 is do_page_fault+48.

Now, when I found the system frozen, I did a few snapshots with
magic sysrq. I found twenty or so `inews' processes and `nnrpd'
processes in the status `R', one of nnrpd's being "current". Of
course, there where other processes too.

Now, memory info showed that there are only 4k and 16k free pages,
and that the number of "Total failed network buffer allocs" is very
very high. Mem-info dump is attached.

Then I did 32 register dumps. I found that the system spends most of
the time in the memory allocation code (slab.c and page_alloc.c), and,
most important, some time in wait_for_tcp_memory and in __release_sock
in TCP code. List of PC values with some comments and excerpts of
source attached.

Taking into account the picture in the memory, I can suggest that the
problem is arizing because of a memory leak in the TCP code or
something like this.

2.0.30 runs absolutely stable on this machine.

I hope someone will investigate this. I can reproduce the problem
and provide more information upon request.

-- 
Eugene Crosser; 2:5020/230@fidonet; http://www.average.org/~crosser/
--=-=-=__SbUrIiyAsIc+L3wzRmIdBsFLT__=-=-=
Content-Type: text/plain
Content-Description: SysRq-M dump
Content-Disposition: attachment; filename="meminfo"

Mem-info: Free pages: 1336kB ( 216*4kB 58*8kB 0*16kB 0*32kB 0*64kB 0*128kB = 1336kB) Swap cache: add 9416/9416, delete 4976390/7793, find 27327/1618 Free swap: 35204kB 4096 pages of RAM 498 free pages 426 reserved pages 2306 pages shared Buffer memory: 392kB Buffer heads: 418 Buffer blocks: 392 CLEAN: 49 buffers, 16 used (last=49), 0 locked, 0 protected, 0 dirty LOCKED: 315 buffers, 37 used (last=315), 0 locked, 0 protected, 0 dirty DIRTY: 5 buffers, 5 used (last=5), 0 locked, 0 protected, 5 dirty Networking buffers in use : 171 Total network buffer allocations : 252673 Total failed network buffer allocs : 1696948269 IP fragment buffer size : 0 --=-=-=__SbUrIiyAsIc+L3wzRmIdBsFLT__=-=-= Content-Type: text/plain Content-Description: SysRq-P dump with addresses resolved and comments Content-Disposition: attachment; filename="where"

c0125969(__get_free_pages +0199) page_alloc.c:222 c01257db(__get_free_pages +000b) page_alloc.c:202 c012387e(kmalloc +001e) slab.c:1614

for (; csizep->cs_size; csizep++) { 00000f31 <kmalloc+11> cmpl $0x0,0x0 00000f38 <kmalloc+18> je 000010dc <kmalloc+1bc> 00000f3e <kmalloc+1e> leal (%esi),%esi ## ESI=0000f930

c0125969(__get_free_pages +0199) page_alloc.c:222 c014ecd8(__release_sock +0068) sock.c:708 c0125969(__get_free_pages +0199) page_alloc.c:222 c0125969(__get_free_pages +0199) page_alloc.c:222 c014f286(alloc_skb +0006) skbuff.c:117 c0125969(__get_free_pages +0199) page_alloc.c:222 c0125969(__get_free_pages +0199) page_alloc.c:222 c0125843(__get_free_pages +0073) page_alloc.c:218 c01231bf(kmem_cache_grow +00ff) slab.c:504 (slab.c:1188) c0125969(__get_free_pages +0199) page_alloc.c:222 c0125969(__get_free_pages +0199) page_alloc.c:222 c0125969(__get_free_pages +0199) page_alloc.c:222 c0123880(kmalloc +0020) slab.c:1615 c01239e8(kmalloc +0188) slab.c:1449 c01230c7(kmem_cache_grow +0007) slab.c:1136 c014ecd0(__release_sock +0060) atomic.h:59 (sock.c:705)

static __inline__ void atomic_dec(volatile atomic_t *v) { __asm__ __volatile__( 00000a20 <__release_sock+60> lock decl 0x0

c015d65a(wait_for_tcp_memory +001a) sock.h:717 (tcp.c:719)

__release_sock(sk); ## EAX=0 EDI=286 000005b4 <wait_for_tcp_memory+14> pushl %edi 000005b5 <wait_for_tcp_memory+15> call 000005b6 <wait_for_tcp_memory+16> 000005ba <wait_for_tcp_memory+1a> addl $0x4,%esp

c01231bf(kmem_cache_grow +00ff) slab.c:504 c0125969(__get_free_pages +0199) page_alloc.c:222 c0125969(__get_free_pages +0199) page_alloc.c:222 c01257d7(__get_free_pages +0007) page_alloc.c:198 c0123880(kmalloc +0020) slab.c:1615 c01231bf(kmem_cache_grow +00ff) slab.c:504 c015d77c(wait_for_tcp_memory +013c) sock.h:702 (tcp.c:738)

} 000006dc <wait_for_tcp_memory+13c> addl $0x4,%esp ## ES=18

c0125847(__get_free_pages +0077) page_alloc.c:218 c01231bf(kmem_cache_grow +00ff) slab.c:504 c01231dd(kmem_cache_grow +011d) slab.c:513 c01231bf(kmem_cache_grow +00ff) slab.c:504 c01231bf(kmem_cache_grow +00ff) slab.c:504 --=-=-=__SbUrIiyAsIc+L3wzRmIdBsFLT__=-=-=--