2.0.32 exit.c bug when CLONE_FILES? [2.1.65 also]

Ingo Molnar (mingo@pc7537.hil.siemens.at)
Tue, 25 Nov 1997 12:40:31 +0100 (MET)


Dan Hollis has sent me some very interesting oopses. There still seems to
be a bug/race in sys_close() / close_fp() <==> sys_exit() / close_files()
/ close_fp(), when there is CLONE_FILES between two processes. (bug
present in 2.1.65 also, i think)

so far this only seems to happen in the Roxan webserver, which uses
the CLONE_FILES clone() flag.

at first glance, the exit.c:close_files() code does not seem to be safe
when we sleep between two close_fp()'s, because we carry
files->open_fds.fds_bits[j] over the blocking point, and 'set' might not
be valid at that point anymore, possibly resulting in two parallel
close_fp()'s for the same filepointer.

the Roxan oopses i've analysed mostly seem to crash when we sleep in
locks_remove_locks(), this also explains why there is no 'VFS: Close:
file count is 0' message, i think. Here are those two oopses:

----- OOPS #1 --------->
general protection: 0000
CPU: 0
EIP: 0010:[locks_remove_locks+12/56]
EFLAGS: 00010286
eax: f000ef6f ebx: 01878018 ecx: 00000000 edx: 00000000
esi: 00000000 edi: f000ef6f ebp: 00d80810 esp: 00feff64
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process Count.cgi (pid: 21687, process nr: 57, stackpage=00fef000)
Stack: 0000d501 00122103 01878018 00000000 0000d501 0000000a 00000001 0011674a
00000000 0000000b 00feffbc fffffdff 00000200 0010a4bb 0000000b 00000200
080193d8 00003ae8 080158f0 0010a6c2 00000200 00feffbc 40090c88 00008c28
Call Trace: [close_fp+55/92] [do_exit+274/492] [do_signal+547/632] [signal_return+18/64]
Code: 8b 50 50 85 d2 74 22 f6 42 1c 01 74 0f 53 83 c0 50 50 e8 15

----- OOPS #2 --------->
general protection: 0000
CPU: 0
EIP: 0010:[locks_remove_locks+12/56]
EFLAGS: 00010286
eax: f000ef6f ebx: 011a4414 ecx: 00000000 edx: 00000000
esi: 00000000 edi: f000ef6f ebp: 00ec9810 esp: 00e37f7c
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process cgi (pid: 21686, process nr: 61, stackpage=00e37000)
Stack: 0000d501 00122103 011a4414 00000000 0000d501 0000000a 00000001 0011674a
00000000 011a4414 ffffffff fffffffc 00000000 00116832 00000000 0010a5f5
00000000 00000000 4008f42c ffffffff fffffffc 00000000 ffffffda 0000002b
Call Trace: [close_fp+55/92] [do_exit+274/492] [sys_exit+14/16] [system_call+85/128]
Code: 8b 50 50 85 d2 74 22 f6 42 1c 01 74 0f 53 83 c0 50 50 e8 15
<----------------------------------------------------------------

-- mingo