Re: How to translate an oops

Richard B. Johnson (root@chaos.analogic.com)
Fri, 16 Oct 1998 08:48:29 -0400 (EDT)


On Thu, 15 Oct 1998, Linux Lists wrote:

>
> On Wed, 14 Oct 1998, David Woodhouse wrote:
> >
> > page fault from irq handler: 0000
> > CPU: 0
>
> <snip>
>
> > Call Trace: [<c0165996>] [<c0169a4b>] [<c0169eab>] [<c0169ef8>] [<c015f318>] [<c01179d5>] [<c0110a5b>]
> > [<c0108041>] [<c0108078>] [<c0109810>] [<c0106084>] [<c0106073>] [<c0106000>] [<c0100176>]
> > Code: 13 46 a4 13 46 a8 13 46 ac 13 46 b0 13 46 b4 13 46 b8 13 46
> > Aiee, killing interrupt handler
> > Kernel panic: Attempted to kill the idle task!
> > In swapper task - not syncing
> > Using `/boot/System.map-test' to map addresses to symbols.
> >
> > >>EIP: c01da663 <csum_partial+77/e8>
> > Trace: c0165996 <ip_fw_demasquerade+156/49c>
> > Trace: c0169a4b <ip_local_deliver+1f/22c>
> > Trace: c0169eab <ip_rcv+253/2d4>
> > Trace: c0169ef8 <ip_rcv+2a0/2d4>
>
> How do you do this translation ??? This is something I've always wanted to
> know but I never knew how to ask ... :)
>
> If you can clarify this issue to me, I would really appreciate it.

This information is __sometimes__ enough for experienced programmers to
use to find the problem. However, most of us only look at the source if
there is a problem, so we don't know where to start.

When I get an oops, I just write down the EIP address, nothing else.
Then I rebuild the kernel with the '-g' option so gdb can show me the
way. You do this my adding -g to the CFLAGS in the top-level Makefile.
You must now do a 'make clean' before rebuilding.

Then I boot with the new kernel and hope to get the oops again.
Once I get the EIP address I do....

Script started on Tue Sep 29 15:08:31 1998

# gdb vmlinux
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15 (i586-unknown-linux), Copyright 1995 Free Software Foundation, Inc...
(gdb) list *0xc018c409
|______________ This was the EIP address.
0xc018c409 is in fib_convert_rtentry (fib_semantics.c:719).
714
715 rtm->rtm_dst_len = plen;
716 rta->rta_dst = ptr;
717
718 if (r->rt_metric) {
719 *(u32*)r->rt_pad3 = r->rt_metric - 1;
720 rta->rta_priority = (u32*)r->rt_pad3;
721 }
722 if (r->rt_flags&RTF_REJECT) {
723 rtm->rtm_scope = RT_SCOPE_HOST;
(gdb) quit
# exit
exit

Script done on Tue Sep 29 15:10:19 1998

So gdb now shows me the actual line(s) of code that produced the
problem. It is really very easy. I usually communicate directly
with the particular code authors when (if) I find the problem.

Sometimes the problem is not as easy to find because the code is
correct but there may be a race-condition elsewhere. In that case
I try to demonstrate the problem on the kernel list so everybody
can take a hack at it.

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.123 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/