kfree oops question

From: Don (don@research-cistw.saic.com)
Date: Sun Jul 16 2000 - 01:28:45 EST


I've been trying to track a kernel paging oops problem, during the execution
of kfree. The kernel in question has a lot of modifications from 2.2.16, (this
doesn't happen in the unmodified kernel) but the way it oopses is not
consistent with the kinds of changes made. I'm hoping someone will be able
to give me an idea of how I could possibly be disrupting kernel paging.

First, there are a lot of printk's. Mysteriously, it appears that simply by
taking out the printk's, the kernel is definately more stable. (My method of
provoking the oops is to overflow a buffer, and my rate of success went down.
And the oops usually happened a couple of process exits later.)

Secondly, I have a lot of kmalloc's with GFP_KERNEL (there's only one kfree,
and it's matched with a specific kmalloc) but the problem does not appear to
be related to how that memory is used. So could this be a factor? And more
physical memory will help? Other than the space taken up by the strings in
the fmt which are passed to printk resulting in more paging, I'm trying to
think how this could cause such an effect.

Third, there's basically no change to the way things fundamentally work, e.g.
no changes to memory mapping, etc, no changes to kmalloc or kfree, etc,
except for this - the task_struct has two ints added to the *end* of it
(I know all about the asm routines expecting things in certain locations).

I've got three ksymoops that all point to the same spot in kfree but got
there different ways:

>>EIP; c0132f14 <kfree+84/1d0> <=====
Trace; c011a29a <do_exit+216/2d0>
Trace; c011a363 <sys_exit+f/10>
Trace; c01092a0 <system_call+34/38>

>>EIP; c013304c <kfree+84/1d0> <=====
Trace; c0142912 <sys_select+542/550>
Trace; c01580ed <ext2_sync_file+85/a4>
Trace; c01092a0 <system_call+34/38>

>>EIP; c0133050 <kfree+84/1d0> <=====
Trace; c011a29a <do_exit+216/2d0>
Trace; c0109161 <do_signal+255/308>
Trace; c0111980 <force_sig_info+a8/b4>
Trace; c0111bc9 <force_sig+11/18>
Trace; c0110d54 <do_page_fault+1a0/3c0>
Trace; c01093b1 <error_code+2d/34>
Trace; c01092e8 <signal_return+14/18>

all with this code:

Code; c0133050 <kfree+84/1d0>
00000000 <_EIP>:
Code; c0133050 <kfree+84/1d0> <=====
   0: 8b 69 08 movl 0x8(%ecx),%ebp <=====
Code; c0133053 <kfree+87/1d0>
   3: 81 fd 2b 2f c3 a5 cmpl $0xa5c32f2b,%ebp
Code; c0133059 <kfree+8d/1d0>
   9: 0f 85 e5 00 00 00 jne f4 <_EIP+0xf4> c0133144 <kfree+178/1d0>
Code; c013305f <kfree+93/1d0>
   f: 8b 69 0c movl 0xc(%ecx),%ebp
Code; c0133062 <kfree+96/1d0>
  12: 85 ed testl %ebp,%ebp

Can anyone shed any light on whether anything I'm doing above (printk, kmalloc,
task_struct) could be causing this sort of behavior, or can think of anything
in particular that would cause this sort of results?

Thanks

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:08 EST