EFAULT in read() after fork() looks like recursion

Michael L. Galbraith (mikeg@mikeg.weiden.de)
Wed, 8 Jan 1997 18:56:21 +0100 (MET)


Hello Kernel-Guru's...

I'm stuck, been stuck in a do-loop for weeks.

When I do tcpspray -e localhost, "double lock on socket at xxx" pops up
followed by EFAULT in read().

In v2.0.x, it fails with "receive:: Bad address" every time. In 2.1.x,
it's more interesting... fails if socket from first run is still there.

This is against 2.1.21pre+kfaultd+etal.

mikeg:# tcpspray -e localhost
Received 102400 bytes in 0.053746 seconds (1860.604 kbytes/s)
Transmitted 102400 bytes in 0.046434 seconds (2153.594 kbytes/s)
mikeg:# tcpspray -e localhost
Received 102400 bytes in 0.086368 seconds (1157.836 kbytes/s)
Transmitted 102400 bytes in 0.045373 seconds (2203.954 kbytes/s)
mikeg:# tcpspray -e localhost
Received 102400 bytes in 0.291235 seconds (343.365 kbytes/s)
Transmitted 102400 bytes in 0.292925 seconds (341.384 kbytes/s)
mikeg:# tcpspray -e localhost

*** about 5 minutes hangtime then......

receive:: Connection timed out
receive:: Bad address
Segmentation fault
mikeg:#

*** meanwhile......goto vt2 and rummage

mikeg:# ps -al
invalid psdatabase
^^^^^^^^^^^^^^^^^^
F UID PID PPID PRI NI SIZE RSS WCHAN STAT TTY TIME COMMAND
140 0 507 1 0 0 756 308 c012d02c S ? 0:00 inetd
100000 0 525 313 0 0 744 300 c010abf0 S 1 0:00 tcpspray -e localhost
40 0 527 507 0 0 756 316 c014bf0f S ? 0:00 -echo [127.0.0.1]
100040 0 526 525 0 0 744 296 c014b41f S 1 0:00 tcpspray -e localhost

*** then, as I'm looking around in /proc...

Oops: 0000
CPU: 0
EIP: 0010:[<c0111f41>]
EFLAGS: 00010206
eax: c3ce576c ebx: c03ce198 ecx: 00000000 edx: 0000000a
esi: c3ce576c edi: c3ce5768 ebp: c500ff3c esp: c500ff2c
ds: 0018 es: 0018 ss: 0018
Process tcpspray (pid: 525, process nr: 7, stackpage=c500f000)
Stack: c3cb1810 c3ce5750 c3ce56c8 00000000 c4286810 c01566ed c3ce576c c0156ac8
c3cb1810 c3ce5750 c3ce56c8 c013762b c3ce5750 00000000 c3ce56c8 c0137a3c
c3ce5750 c4ffe540 c01230b2 c3ce56c8 c4ffe540 00000000 c4ffe540 c0123120
Call Trace: [<c01566ed>] [<c0156ac8>] [<c013762b>] [<c0137a3c>] [<c01230b2>] [<c0123120>] [<c01178f6>]
[<c01179cf>] [<c010abe8>]
Code: 8b 0a 83 f9 01 75 5e 9c 8f 45 fc fa c7 02 00 00 00 00 83 7a

Using `/usr/src/linux/System.map' to map addresses to symbols.

>>EIP: c0111f41 <wake_up_interruptible+37/da>
Trace: c01566ed <def_callback1+17/1c>
Trace: c0156ac8 <inet_release+26/7c>
Trace: c013762b <sock_release+1f/3e>
Trace: c0137a3c <sock_close+32/38>
Trace: c01230b2 <__fput+1c/40>
Trace: c0123120 <close_fp+4a/58>
Trace: c01178f6 <do_exit+104/1ce>
Trace: c01179cf <sys_exit+f/10>
Trace: c010abe8 <system_call+38/40>

Code: c0111f41 <wake_up_interruptible+37/da> movl (%edx),%ecx
Code: c0111f43 <wake_up_interruptible+39/da> cmpl $0x1,%ecx
Code: c0111f46 <wake_up_interruptible+3c/da> jne c0111fa6 <wake_up_interruptible+9c/da>
Code: c0111f48 <wake_up_interruptible+3e/da> pushf
Code: c0111f49 <wake_up_interruptible+3f/da> popl 0xfffffffc(%ebp)
Code: c0111f4c <wake_up_interruptible+42/da> cli
Code: c0111f4d <wake_up_interruptible+43/da> movl $0x0,(%edx)
Code: c0111f53 <wake_up_interruptible+49/da> cmpl $0xffffff90,0x0(%edx)
Code: c0111f57 <wake_up_interruptible+4d/da> nop
Code: c0111f58 <wake_up_interruptible+4e/da> nop

*** after the Oops, the hung processes are gone - but now I have a hung socket

mikeg:# netstat --tcp
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 6728 38912 localhost:1036 localhost:echo CLOSE

*** try now to duplicate other known failure mode

mikeg:# tcpspray -e localhost
Received 102400 bytes in 0.217373 seconds (460.039 kbytes/s)
Transmitted 102400 bytes in 0.177995 seconds (561.814 kbytes/s)
mikeg:# tcpspray -e localhost
Exception at [<c013a6c4>] (c01abf69)

*** hanging again... going to vt2

mikeg:# gdb vmlinux -core /proc/kcore
(gdb) x 0xc013a6c4
0xc013a6c4 <memcpy_toiovec+96>: 0xc189a5f3
(gdb) x 0xc01abf69
0xc01abf69 <sprintf+3009>: 0x00884c8d
^^^^^^^ fixup? I think I'll be reading that exception
doc again again again...

*** then, yet again...

Oops: 0000
CPU: 0
EIP: 0010:[<c014b448>]
EFLAGS: 00010013
eax: 00000000 ebx: c3bb0c0c ecx: c500fedc edx: 00000000
esi: c5017ea8 edi: c5017ea8 ebp: 00000202 esp: c5017e98
ds: 0018 es: 0018 ss: 0018
Process tcpspray (pid: 825, process nr: 11, stackpage=c5017000)
Stack: 00000f67 00000000 c3bb0c0c c5017edc c36d3018 c500fedc c014b875 c3bb0c0c
c3bb0c0c ffffffea 00000400 00000400 00000dd8 0804ac20 00000400 c3bb0d48
00000000 c5017f80 c015048d c3bb0c0c 00000000 c5017f80 00000400 00000000
Call Trace: [<c014b875>] [<c015048d>] [<c01572f1>] [<c01376d5>] [<c0137929>] [<c01234ed>] [<c010abe8>]
Code: 8b 42 04 39 f0 74 05 89 c2 eb f5 90 89 4a 04 55 9d ff 43 24
Using `/usr/src/linux/System.map' to map addresses to symbols.

>>EIP: c014b448 <wait_for_tcp_memory+ce/ea>
Trace: c014b875 <tcp_do_sendmsg+329/504>
Trace: c015048d <tcp_v4_sendmsg+79/96>
Trace: c01572f1 <inet_sendmsg+79/88>
Trace: c01376d5 <sock_sendmsg+8b/ae>
Trace: c0137929 <sock_write+ad/b8>
Trace: c01234ed <sys_write+ad/e4>
Trace: c010abe8 <system_call+38/40>

Code: c014b448 <wait_for_tcp_memory+ce/ea> movl 0x4(%edx),%eax
Code: c014b44b <wait_for_tcp_memory+d1/ea> cmpl %esi,%eax
Code: c014b44d <wait_for_tcp_memory+d3/ea> je c014b454 <wait_for_tcp_memory+da/ea>
Code: c014b44f <wait_for_tcp_memory+d5/ea> movl %eax,%edx
Code: c014b451 <wait_for_tcp_memory+d7/ea> jmp c014b448 <wait_for_tcp_memory+ce/ea>
Code: c014b453 <wait_for_tcp_memory+d9/ea> nop
Code: c014b454 <wait_for_tcp_memory+da/ea> movl %ecx,0x4(%edx)
Code: c014b457 <wait_for_tcp_memory+dd/ea> pushl %ebp
Code: c014b458 <wait_for_tcp_memory+de/ea> popf
Code: c014b459 <wait_for_tcp_memory+df/ea> incl 0x24(%ebx)
Code: c014b45c <wait_for_tcp_memory+e2/ea>

*** things go down hill rapidly from here if I continue to poke at THE_BEAST

Unable to handle kernel NULL pointer dereference at virtual address 0000079f
current->tss.cr3 = 00101000, Dr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0111f41>]
EFLAGS: 00010206
eax: c3d9bf2c ebx: 080a2c44 ecx: c3f04dd4 edx: 0000079f
esi: c3d9bf2c edi: c3d9bf28 ebp: c01c6b88 esp: c01c6b78
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c01c4c94)
Stack: c3f04c0c c3f04c0c 00000001 c3f04c0c c01c6bd8 c01566ed c3d9bf2c c014f8d5
c3f04c0c c3f04c0c c3f04d48 00000001 c014fc7d c3f04c0c c3f04c0c 00000002
00002ee0 c014fbd6 c3f04dd4 c01127fc c3f04c0c 00000001 ffffffff 00000001
Call Trace: [<c01566ed>] [<c014f8d5>] [<c014fc7d>] [<c014fbd6>] [<c01127fc>] [<c0118b8f>] [<c010ab8b>]
[<c0109bb1>] [<c0109c03>] [<c010abe8>] [<c01099c8>] [<c0109168>]
Code: 8b 0a 83 f9 01 75 5e 9c 8f 45 fc fa c7 02 00 00 00 00 83 7a
Aiee, killing interrupt handler
kfree of non-kmalloced memory: c01c6cdc, next= 00000000, order=0
kfree of non-kmalloced memory: c01c6ccc, next= 00000000, order=0
kfree of non-kmalloced memory: c01c71e0, next= 00000000, order=0
idle task may not sleep

Using `/usr/src/linux/System.map' to map addresses to symbols.

>>EIP: c0111f41 <wake_up_interruptible+37/da>
Trace: c01566ed <def_callback1+17/1c>
Trace: c014f8d5 <tcp_write_timeout+b7/160>
Trace: c014fc7d <tcp_retransmit_timer+a7/ae>
Trace: c014fc7d <tcp_retransmit_timer+a7/ae>
Trace: c01127fc <timer_bh+13c/15a>
Trace: c0118b8f <do_bottom_half+3b/5c>
Trace: c010ab8b <handle_bottom_half+b/20>
Trace: c0109bb1 <hard_idle+1d/36>
Trace: c0109c03 <sys_idle+39/6c>
Trace: c010abe8 <system_call+38/40>
Trace: c01099c8 <kfaultd>
Trace: c0109168 <start_kernel+1e8/1f4> <--- wonder why we were here?!

Code: c0111f41 <wake_up_interruptible+37/da> movl (%edx),%ecx
Code: c0111f43 <wake_up_interruptible+39/da> cmpl $0x1,%ecx
Code: c0111f46 <wake_up_interruptible+3c/da> jne c0111fa6 <wake_up_interruptible+9c/da>
Code: c0111f48 <wake_up_interruptible+3e/da> pushf
Code: c0111f49 <wake_up_interruptible+3f/da> popl 0xfffffffc(%ebp)
Code: c0111f4c <wake_up_interruptible+42/da> cli
Code: c0111f4d <wake_up_interruptible+43/da> movl $0x0,(%edx)
Code: c0111f53 <wake_up_interruptible+49/da> cmpl $0xffffff90,0x0(%edx)
Code: c0111f57 <wake_up_interruptible+4d/da> nop
Code: c0111f58 <wake_up_interruptible+4e/da> nop
Jan 8 17:08:09 mikeg last message repeated 4 times

Unable to handle kernel paging request at virtual address 0a010e88
current->tss.cr3 = 00101000, Dr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0111f38>]
EFLAGS: 00010203
eax: c3d9bb4c ebx: 0a010e88 ecx: c42fd1e0 edx: 00000000
esi: c3d9bb4c edi: c3d9bb48 ebp: c01c6a34 esp: c01c6a24
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c01c4c94)
Stack: c42fd018 c42fd018 00000001 c42fd018 c01c6a84 c01566ed c3d9bb4c c014f8d5
c42fd018 c42fd018 c42fd154 00000001 c014fc7d c42fd018 c42fd018 00000002
00002ee0 c014fbd6 c42fd1e0 c01127fc c42fd018 00000001 ffffffff 00000001
Call Trace: [<c01566ed>] [<c014f8d5>] [<c014fc7d>] [<c014fbd6>] [<c01127fc>] [<c0118b8f>] [<c010ab8b>]
[<c01177e4>] [<c011798b>] [<c5800000>] [<c010b123>] [<c6000000>] [<c01a0018>] [<c0111772>] [<c010ad10>]
[<c0111f41>] [<c01566ed>] [<c014f8d5>] [<c014fc7d>] [<c014fbd6>] [<c01127fc>] [<c0118b8f>] [<c010ab8b>]
[<c0109bb1>] [<c0109c03>] [<c010abe8>] [<c01099c8>] [<c0109168>]
Code: 8b 13 8b 5b 04 85 d2 74 65 8b 0a 83 f9 01 75 5e 9c 8f 45 fc
Aiee, killing interrupt handler
wait_queue is bad (eip = c01566ed)
q = c3d9bf2c
*q = c5017ea8
wait_queue is bad (eip = c01566ed)
q = c3d9bb4c
*q = c5017ea8
(gdb) disassemble 0xc01566ed
Dump of assembler code for function def_callback1:
0xc01566d6 <def_callback1>: movl 0x4(%esp,1),%edx
0xc01566da <def_callback1+4>: movb 0x2a(%edx),%al
0xc01566dd <def_callback1+7>: testb %al,%al
0xc01566df <def_callback1+9>: jne 0xc01566f0 <def_callback1+26>
0xc01566e1 <def_callback1+11>: movl 0xb8(%edx),%eax
0xc01566e7 <def_callback1+17>: pushl %eax
0xc01566e8 <def_callback1+18>: call 0xc0111f0a <wake_up_interruptible>
0xc01566ed <def_callback1+23>: addl $0x4,%esp
0xc01566f0 <def_callback1+26>: ret
0xc01566f1 <def_callback1+27>: nop
End of assembler dump.

Using `/usr/src/linux/System.map' to map addresses to symbols.

>>EIP: c0111f38 <wake_up_interruptible+2e/da>
Trace: c01566ed <def_callback1+17/1c>
Trace: c014f8d5 <tcp_write_timeout+b7/160>
Trace: c014fc7d <tcp_retransmit_timer+a7/ae>
Trace: c014fc7d <tcp_retransmit_timer+a7/ae>
Trace: c01127fc <timer_bh+13c/15a>
Trace: c0118b8f <do_bottom_half+3b/5c>
Trace: c010ab8b <handle_bottom_half+b/20>
Trace: c01177e4 <exit_notify+1d4/1e2>
Trace: c011798b <do_exit+199/1ce>
Trace: c5800000
Trace: c010b123 <die_if_kernel+2b7/2c0>
Trace: c6000000
Trace: c01a0018 <sr_attach+52/f2>
Trace: c0111772 <do_page_fault+2ca/2dc>
Trace: c010ad10 <error_code+30/40>
Trace: c0111f41 <wake_up_interruptible+37/da>
Trace: c01566ed <def_callback1+17/1c>
Trace: c014f8d5 <tcp_write_timeout+b7/160>
Trace: c014fc7d <tcp_retransmit_timer+a7/ae>
Trace: c014fc7d <tcp_retransmit_timer+a7/ae>
Trace: c01127fc <timer_bh+13c/15a>
Trace: c0118b8f <do_bottom_half+3b/5c>
Trace: c010ab8b <handle_bottom_half+b/20>
Trace: c0109bb1 <hard_idle+1d/36>
Trace: c0109c03 <sys_idle+39/6c>
Trace: c010abe8 <system_call+38/40>
Trace: c01099c8 <kfaultd>
Trace: c0109168 <start_kernel+1e8/1f4>

Code: c0111f38 <wake_up_interruptible+2e/da> movl (%ebx),%edx
Code: c0111f3a <wake_up_interruptible+30/da> movl 0x4(%ebx),%ebx
Code: c0111f3d <wake_up_interruptible+33/da> testl %edx,%edx
Code: c0111f3f <wake_up_interruptible+35/da> je c0111fa6 <wake_up_interruptible+9c/da>
Code: c0111f41 <wake_up_interruptible+37/da> movl (%edx),%ecx
Code: c0111f43 <wake_up_interruptible+39/da> cmpl $0x1,%ecx
Code: c0111f46 <wake_up_interruptible+3c/da> jne c0111fa6 <wake_up_interruptible+9c/da>
Code: c0111f48 <wake_up_interruptible+3e/da> pushf
Code: c0111f49 <wake_up_interruptible+3f/da> popl 0xfffffffc(%ebp)
Code: c0111f4c <wake_up_interruptible+42/da>

*** amazing that the kernel can take this kind of abuse and yet live :)