Protection faults in kernel

Camm Maguire (camm@enhanced.com)
Wed, 24 Sep 1997 10:30:53 -0400 (EDT)


Greetings! We're running/developing a quasi-real time quote server
for Linux, kernel 2.0.30. One program listens on a socket for
incoming data and writes it to a shared memory buffer/queue. The
other reads the queue at one second intervals, and
serves/processes/saves data in the interim. The second program
handles its task switching via signals and hopefully judicious use of
sigprocmask. The signal handlers are initialized with SA_RESTART
flags, and all I/O reads and writes are set to non-blocking.

Every three days or so I get kernel panics with the following logs,
among others:

Sep 22 15:13:02 trading kernel: general protection: 0000
Sep 22 15:13:02 trading kernel: CPU: 0
Sep 22 15:13:02 trading kernel: EIP: 0010:[shrink_mmap+116/464]
Sep 22 15:13:02 trading kernel: EFLAGS: 00010213
Sep 22 15:13:02 trading kernel: eax: 00000000 ebx: 00222510 ecx:
00000006 edx: f000ef6f
Sep 22 15:13:02 trading kernel: esi: 000001fe edi: 00000ffe ebp:
00004000 esp: 01b75ef0
Sep 22 15:13:02 trading kernel: ds: 0018 es: 0018 fs: 002b gs:
002b ss: 0018
Sep 22 15:13:02 trading kernel: Process tar (pid: 17162, process nr:
35, stackpage=01b75000)
Sep 22 15:13:02 trading kernel: Stack: 00000006 00000000 00000000
00000001 018e7998 0011f4db 00000006 00000000
Sep 22 15:13:02 trading kernel: 00000003 00000080 00001000
00000000 0011fec2 00000003 00000000 00000001
Sep 22 15:13:02 trading kernel: 00199fa8 007779d8 00001000
00000000 0805e800 00001000 00000293 0011c862
Sep 22 15:13:02 trading arc19[310]: Cannot make archive
/home/camm/ticdat/19970922.tgz, No such file or directory

This tar program was run via a system call from our server program,
arc19. We've set this up so that the call could be interrupted by
SIGALRM, and hopefully restarted successfully. We don't seem to have
any problem restarting read and write calls interrupted in this way.
Also, it was my understanding that "normal" C programing could of
course introduce bugs, but not bring down the kernel. Which makes me
suspect the shared memory stuff.

Unfortunately, I really have little experience with signals or shared
memory. The shared memory code was mostly supplied by our data vendor
for their unix clients, and I was hoping not to have to rewrite it.
My question is as follows: Am I having problems with shared memory,
or with interrupted system calls most likely? This tar system call is
run successfully every two minutes for three days before exhibiting
this kind of behavior.

Any suggestions are most welcome!

-- 
cmaguire@enhanced.com				      Camm Maguire
==================================================================
"The earth is one country, and mankind its citizens."  Baha'u'llah