Re: Qn: Queer "Unable to handle kernel NULL pointer dereference at ..." error in kernel

From: Jan Harkes (jaharkes@cs.cmu.edu)
Date: Wed Apr 23 2003 - 13:15:30 EST


On Wed, Apr 23, 2003 at 06:53:33PM +0100, Yours Lovingly wrote:
> i am sorry for that useless information. i am
> attaching a ksymoops output of that problem (focus on
> the register operation (that ksymoops identifies as
> the fault triggering instruction) in the "code"
> section at the end of the ksymoops output)
...
> inode_parent = parent->d_inode;
> inode = d->d_inode;
> p = (void *)inode_parent;
> me = (void *)inode;
> // Till here things work just fine. I am DEAD SURE of that as i put printk()
> // followed by return here and there and checked.
>
> if( p - me != 0 ) {
> //nfs_print_path(parent);
> printk("return 3\n");
> return;
> }

What I typically do in these cases is,

- remove the object file where the oops occurs
- rerun make
- copy the gcc line that is responsible for compiling the object
- run the same gcc line again, but add a '-g' flag

Now we have an object with debugging symbols, and can use 'objdump
--source <file>.o | less' and get something that has both the source
lines and the related assembly code. In this case the faulting
instruction is about 0x2e bytes past the beginning of nfs_print_path.

But I can get pretty far just from reading the oops and it is probably
the test you added. Ok, it looks like -O2 is actually optimizing away
some of those intermediate variables for you.

> Code; c88b6956 <[nfs]nfs_print_path+2e/58> <=====
> 0: 39 42 08 cmp %eax,0x8(%edx) <=====
> 3: 74 0d je 12 <_EIP+0x12>

Ok, so we're comparing the contents of %eax to something that is 8 bytes
offset from the address stored in %edx. and then jump out if they are
equal.

So this is something like and if (eax != edx->bar) { } construct. And in
fact the test you added was

    if ( p - me != 0 )
    if ( p != me )
    if ( inode_parent != inode )
    if ( parent->d_inode != d->d_inode )

So the contents of either parent->d_inode or d->d_inode was stored in
register eax, and we're dereferencing the other pointer during the test
operation.

Why does it crash, because the pointer we are dereferencing is 0x7, not
really a valid address, in fact it already looks like an inode number.

> eax: 00000006 ebx: c6675024 ecx: 00000001 edx: 00000007

And eax looks pretty suspicious as well. These are not pointers to inode
structures, but possibly the i_ino numbers.

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Apr 23 2003 - 22:00:37 EST