Frequent 2.2.14 kernel oopses, usually w/ eventual crash

From: Karsten M. Self (karsten@opensales.com)
Date: Fri Apr 21 2000 - 06:08:08 EST


This is a re-post of my earlier message (Subject: Kernel Oops, unknown
area, NULL pointer dereference) in the standard format.

Apologies for additional traffic.

[1.] One line summary of the problem:

    Frequent 2.2.14 kernel oopses, usually w/ eventual crash

[2.] Full description of the problem/report:

    Kernel oopses, generally followed by system crash within several
    hours (varies, usually 1-12). Persisting since mid-January (several
    system changes then, not sure which, if any, triggered this), with
    typical intervals being 3-14 days between crashes. Application listed
    is often rxvt, but has varied.

    I've suspected hardware most of the way through this, system has
    been completely swapped, excepting one IDE HD, keyboard, mouse,
    and monitor, from the original system.

    Crash can happen while system is attended or not. Often happens
    while idle (overnight). Occasionally seems to be linked to cron
    script activity.

    I've run mem tests on old memory (found a fault but couldn't duplicate
    it on another system). Stress tested system two nights ago with
    simultaneous looping memory test and kernel builds with "make -j3"
    option. This ran for ~12 hours w/o problems. Following day, an
    idle system crashed.

    Speculation has also been that the compiler version may be a
    problem.

[3.] Keywords (i.e., modules, networking, kernel):

    kernel 2.2.14 gcc 2.95.2 debian woody crash oops

[4.] Kernel version (from /proc/version):

    Linux version 2.2.14 (root@angel) (gcc version 2.95.2 20000220
    (Debian GNU/Linux)) #3 Thu Mar 2 22:30:22 PST 2000

[5.] Output of Oops.. message (if applicable) with symbolic information
     resolved (see Documentation/oops-tracing.txt)

    ksymoops 2.3.4 on i686 2.2.14. Options used
         -V (default)
         -k /proc/ksyms (default)
         -l /proc/modules (default)
         -o /lib/modules/2.2.14/ (default)
         -m /boot/System.map-2.2.14 (default)

    Warning: You did not tell me where to find symbol information. I will
    assume that the log matches the kernel and modules that are running
    right now and I'll use the default options above for symbol resolution.
    If the current kernel and/or modules do not match the log, you can get
    more accurate output by telling me the kernel version and where to find
    map, modules, ksyms etc. ksymoops -h explains the options.

    Apr 20 23:47:18 angel kernel: *pde = 00000000
    Apr 20 23:47:18 angel kernel: Oops: 0002
    Apr 20 23:47:18 angel kernel: CPU: 0
    Apr 20 23:47:18 angel kernel: EIP: 0010:[schedule+132/624]
    Apr 20 23:47:18 angel kernel: EFLAGS: 00010082
    Apr 20 23:47:18 angel kernel: eax: 00000000 ebx: c01e4c00 ecx: c125a000 edx: 00000000
    Apr 20 23:47:18 angel kernel: esi: c125a000 edi: 00000100 ebp: c125b578 esp: c125b570
    Apr 20 23:47:18 angel kernel: ds: 0018 es: 0018 ss: 0018
    Apr 20 23:47:18 angel kernel: Process rxvt (pid: 1482, process nr: 104, stackpage=c125b000)
    Apr 20 23:47:18 angel kernel: Stack: 00000100 c01e4c00 c125a000 c01165a1 c125b5e4 00000040 00000100 c125a000
    Apr 20 23:47:18 angel kernel: 00000100 c125a000 c010926c 0000000b c125b5e4 c01a2dd8 c01a44ce 00000002
    Apr 20 23:47:18 angel kernel: 00000000 c010e1f8 c01a44ce c125b5e4 00000002 c125a000 c125a000 00000100
    Apr 20 23:47:18 angel kernel: Call Trace: [do_exit+633/640] [die_if_no_fixup+0/64] [stext_lock+5548/11700] [stext_lock+11426/11700] [do_page_fault+680/880] [stext_lock+11426/11700] [error_code+45/52]
    Apr 20 23:47:18 angel kernel: Code: 89 42 40 89 50 3c c7 46 3c 00 00 00 00 c7 46 40 00 00 00 00
    Using defaults from ksymoops -t elf32-i386 -a i386

    Code; 00000000 Before first symbol
    00000000 <_EIP>:
    Code; 00000000 Before first symbol
       0: 89 42 40 mov %eax,0x40(%edx)
    Code; 00000003 Before first symbol
       3: 89 50 3c mov %edx,0x3c(%eax)
    Code; 00000006 Before first symbol
       6: c7 46 3c 00 00 00 00 movl $0x0,0x3c(%esi)
    Code; 0000000d Before first symbol
       d: c7 46 40 00 00 00 00 movl $0x0,0x40(%esi)

    Apr 20 23:47:18 angel kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000040
    Apr 20 23:47:18 angel kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
    Apr 20 23:47:18 angel kernel: *pde = 00000000
    Apr 20 23:47:18 angel kernel: Oops: 0002
    Apr 20 23:47:18 angel kernel: CPU: 0
    Apr 20 23:47:18 angel kernel: EIP: 0010:[schedule+132/624]
    Apr 20 23:47:18 angel kernel: EFLAGS: 00010086
    Apr 20 23:47:18 angel kernel: eax: 00000000 ebx: c01e4c00 ecx: c125a000 edx: 00000000
    Apr 20 23:47:18 angel kernel: esi: c125a000 edi: 00000100 ebp: c125b4c8 esp: c125b4c0
    Apr 20 23:47:18 angel kernel: ds: 0018 es: 0018 ss: 0018
    Apr 20 23:47:18 angel kernel: Process rxvt (pid: 1482, process nr: 104, stackpage=c125b000)
    Apr 20 23:47:18 angel kernel: Stack: 00000100 c01e4c00 c125a000 c01165a1 c125b534 00000040 00000100 c125a000
    Apr 20 23:47:18 angel kernel: 00000100 c125a000 c010926c 0000000b c125b534 c01a2dd8 c01a44ce 00000002
    Apr 20 23:47:18 angel kernel: 00000000 c010e1f8 c01a44ce c125b534 00000002 c125a000 c125a000 00000100
    Apr 20 23:47:18 angel kernel: Call Trace: [do_exit+633/640] [die_if_no_fixup+0/64] [stext_lock+5548/11700] [stext_lock+11426/11700] [do_page_fault+680/880] [stext_lock+11426/11700] [error_code+45/52]
    Apr 20 23:47:18 angel kernel: Code: 89 42 40 89 50 3c c7 46 3c 00 00 00 00 c7 46 40 00 00 00 00

    Code; 00000000 Before first symbol
    00000000 <_EIP>:
    Code; 00000000 Before first symbol
       0: 89 42 40 mov %eax,0x40(%edx)
    Code; 00000003 Before first symbol
       3: 89 50 3c mov %edx,0x3c(%eax)
    Code; 00000006 Before first symbol
       6: c7 46 3c 00 00 00 00 movl $0x0,0x3c(%esi)
    Code; 0000000d Before first symbol
       d: c7 46 40 00 00 00 00 movl $0x0,0x40(%esi)

    Apr 20 23:47:18 angel kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000040
    Apr 20 23:47:18 angel kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
    Apr 20 23:47:18 angel kernel: *pde = 00000000
    Apr 20 23:47:18 angel kernel: Oops: 0002
    Apr 20 23:47:18 angel kernel: CPU: 0
    Apr 20 23:47:18 angel kernel: EIP: 0010:[schedule+132/624]
    Apr 20 23:47:18 angel kernel: EFLAGS: 00010082
    Apr 20 23:47:18 angel kernel: eax: 00000000 ebx: c01e4c00 ecx: c125a000 edx: 00000000
    Apr 20 23:47:18 angel kernel: esi: c125a000 edi: 00000100 ebp: c125b418 esp: c125b410
    Apr 20 23:47:18 angel kernel: ds: 0018 es: 0018 ss: 0018
    Apr 20 23:47:18 angel kernel: Process rxvt (pid: 1482, process nr: 104, stackpage=c125b000)
    Apr 20 23:47:18 angel kernel: Stack: 00000100 c01e4c00 c125a000 c01165a1 c125b484 00000040 00000100 c125a000
    Apr 20 23:47:18 angel kernel: 00000100 c125a000 c010926c 0000000b c125b484 c01a2dd8 c01a44ce 00000002
    Apr 20 23:47:18 angel kernel: 00000000 c010e1f8 c01a44ce c125b484 00000002 c125a000 c125a000 00000100
    Apr 20 23:47:18 angel kernel: Call Trace: [do_exit+633/640] [die_if_no_fixup+0/64] [stext_lock+5548/11700] [stext_lock+11426/11700] [do_page_fault+680/880] [stext_lock+11426/11700] [error_code+45/52]
    Apr 20 23:47:18 angel kernel: Code: 89 42 40 89 50 3c c7 46 3c 00 00 00 00 c7 46 40 00 00 00 00

    Code; 00000000 Before first symbol
    00000000 <_EIP>:
    Code; 00000000 Before first symbol
       0: 89 42 40 mov %eax,0x40(%edx)
    Code; 00000003 Before first symbol
       3: 89 50 3c mov %edx,0x3c(%eax)
    Code; 00000006 Before first symbol
       6: c7 46 3c 00 00 00 00 movl $0x0,0x3c(%esi)
    Code; 0000000d Before first symbol
       d: c7 46 40 00 00 00 00 movl $0x0,0x40(%esi)

    Apr 20 23:47:18 angel kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000040
    Apr 20 23:47:18 angel kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
    Apr 20 23:47:18 angel kernel: *pde = 00000000
    Apr 20 23:47:18 angel kernel: Oops: 0002
    Apr 20 23:47:18 angel kernel: CPU: 0
    Apr 20 23:47:18 angel kernel: EIP: 0010:[schedule+132/624]
    Apr 20 23:47:18 angel kernel: EFLAGS: 00010082
    Apr 20 23:47:18 angel kernel: eax: 00000000 ebx: c01e4c00 ecx: c125a000 edx: 00000000
    Apr 20 23:47:18 angel kernel: esi: c125a000 edi: 00000100 ebp: c125b368 esp: c125b360
    Apr 20 23:47:18 angel kernel: ds: 0018 es: 0018 ss: 0018
    Apr 20 23:47:18 angel kernel: Process rxvt (pid: 1482, process nr: 104, stackpage=c125b000)
    Apr 20 23:47:18 angel kernel: Stack: 00000100 c01e4c00 c125a000 c01165a1 c125b3d4 00000040 00000100 c125a000
    Apr 20 23:47:18 angel kernel: 00000100 c125a000 c010926c 0000000b c125b3d4 c01a2dd8 c01a44ce 00000002
    Apr 20 23:47:18 angel kernel: 00000000 c010e1f8 c01a44ce c125b3d4 00000002 c125a000 c125a000 00000100
    Apr 20 23:47:18 angel kernel: Call Trace: [do_exit+633/640] [die_if_no_fixup+0/64] [stext_lock+5548/11700] [stext_lock+11426/11700] [do_page_fault+680/880] [stext_lock+11426/11700] [error_code+45/52]
    Apr 20 23:47:18 angel kernel: Code: 89 42 40 89 50 3c c7 46 3c 00 00 00 00 c7 46 40 00 00 00 00

    Code; 00000000 Before first symbol
    00000000 <_EIP>:
    Code; 00000000 Before first symbol
       0: 89 42 40 mov %eax,0x40(%edx)
    Code; 00000003 Before first symbol
       3: 89 50 3c mov %edx,0x3c(%eax)
    Code; 00000006 Before first symbol
       6: c7 46 3c 00 00 00 00 movl $0x0,0x3c(%esi)
    Code; 0000000d Before first symbol
       d: c7 46 40 00 00 00 00 movl $0x0,0x40(%esi)

    Apr 20 23:47:18 angel kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000040
    Apr 20 23:47:18 angel kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
    Apr 20 23:47:18 angel kernel: *pde = 00000000
    Apr 20 23:47:18 angel kernel: Oops: 0002
    Apr 20 23:47:18 angel kernel: CPU: 0
    Apr 20 23:47:18 angel kernel: EIP: 0010:[schedule+132/624]
    Apr 20 23:47:18 angel kernel: EFLAGS: 00010086
    Apr 20 23:47:18 angel kernel: eax: 00000000 ebx: c01e4c00 ecx: c125a000 edx: 00000000
    Apr 20 23:47:18 angel kernel: esi: c125a000 edi: 00000100 ebp: c125b2b8 esp: c125b2b0
    Apr 20 23:47:18 angel kernel: ds: 0018 es: 0018 ss: 0018
    Apr 20 23:47:18 angel kernel: Process rxvt (pid: 1482, process nr: 104, stackpage=c125b000)
    Apr 20 23:47:18 angel kernel: Stack: 00000100 c01e4c00 c125a000 c01165a1 c125b324 00000040 00000100 c125a000
    Apr 20 23:47:18 angel kernel: 00000100 c125a000 c010926c 0000000b c125b324 c01a2dd8 c01a44ce 00000002
    Apr 20 23:47:18 angel kernel: 00000000 c010e1f8 c01a44ce c125b324 00000002 c125a000 c125a000 00000100
    Apr 20 23:47:18 angel kernel: Call Trace: [do_exit+633/640] [die_if_no_fixup+0/64] [stext_lock+5548/11700] [stext_lock+11426/11700] [do_page_fault+680/880] [stext_lock+11426/11700] [error_code+45/52]
    Apr 20 23:47:18 angel kernel: Code: 89 42 40 89 50 3c c7 46 3c 00 00 00 00 c7 46 40 00 00 00 00

    Code; 00000000 Before first symbol
    00000000 <_EIP>:
    Code; 00000000 Before first symbol
       0: 89 42 40 mov %eax,0x40(%edx)
    Code; 00000003 Before first symbol
       3: 89 50 3c mov %edx,0x3c(%eax)
    Code; 00000006 Before first symbol
       6: c7 46 3c 00 00 00 00 movl $0x0,0x3c(%esi)
    Code; 0000000d Before first symbol
       d: c7 46 40 00 00 00 00 movl $0x0,0x40(%esi)

    Apr 20 23:47:18 angel kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000040
    Apr 20 23:47:18 angel kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
    Apr 20 23:47:18 angel kernel: *pde = 00000000
    Apr 20 23:47:18 angel kernel: Oops: 0002
    Apr 20 23:47:18 angel kernel: CPU: 0
    Apr 20 23:47:18 angel kernel: EIP: 0010:[schedule+132/624]
    Apr 20 23:47:18 angel kernel: EFLAGS: 00010092
    Apr 20 23:47:18 angel kernel: eax: 00000000 ebx: c01e4c00 ecx: c125a000 edx: 00000000
    Apr 20 23:47:18 angel kernel: esi: c125a000 edi: 00000100 ebp: c125b208 esp: c125b200
    Apr 20 23:47:18 angel kernel: ds: 0018 es: 0018 ss: 0018
    Apr 20 23:47:18 angel kernel: Process rxvt (pid: 1482, process nr: 104, stackpage=c125b000)
    Apr 20 23:47:18 angel kernel: Stack: 00000100 c01e4c00 c125a000 c01165a1 c125b274 00000040 00000100 c125a000
    Apr 20 23:47:18 angel kernel: 00000100 c125a000 c010926c 0000000b c125b274 c01a2dd8 c01a44ce 00000002
    Apr 20 23:47:18 angel kernel: 00000000 c010e1f8 c01a44ce c125b274 00000002 c125a000 c125a000 00000100
    Apr 20 23:47:18 angel kernel: Call Trace: [do_exit+633/640] [die_if_no_fixup+0/64] [stext_lock+5548/11700] [stext_lock+11426/11700] [do_page_fault+680/880] [stext_lock+11426/11700] [error_code+45/52]
    Apr 20 23:47:18 angel kernel: Code: 89 42 40 89 50 3c c7 46 3c 00 00 00 00 c7 46 40 00 00 00 00

    Code; 00000000 Before first symbol
    00000000 <_EIP>:
    Code; 00000000 Before first symbol
       0: 89 42 40 mov %eax,0x40(%edx)
    Code; 00000003 Before first symbol
       3: 89 50 3c mov %edx,0x3c(%eax)
    Code; 00000006 Before first symbol
       6: c7 46 3c 00 00 00 00 movl $0x0,0x3c(%esi)
    Code; 0000000d Before first symbol
       d: c7 46 40 00 00 00 00 movl $0x0,0x40(%esi)

    1 warning issued. Results may not be reliable.

[6.] A small shell script or example program which triggers the
     problem (if possible)

    n/a

[7.] Environment

    BASH=/bin/bash
    BASH_ENV='~/.bashrc'
    BASH_VERSINFO=([0]="2" [1]="04" [2]="0" [3]="1" [4]="release" [5]="i386-pc-linux-gnu")
    BASH_VERSION='2.04.0(1)-release'
    DIRSTACK=()
    DISPLAY=:1
    EUID=0
    GROUPS=()
    HOME=/root
    HOSTNAME=angel
    HOSTTYPE=i386
    HZ=100
    IFS='
    '
    LANG=C
    LOGNAME=root
    LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jpg=01;35:*.png=01;35:*.gif=01;35:*.bmp=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.png=01;35:*.mpg=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:'
    MACHTYPE=i386-pc-linux-gnu
    MAIL=/var/spool/mail/root
    OPTERR=1
    OPTIND=1
    OSTYPE=linux-gnu
    PATH=/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/usr/local/bin/X11:/usr/bin/X11
    PIPESTATUS=([0]="0")
    PPID=4044
    PS1='\[\033]0;\u@\h:\w\007\][\[\033[7m\]\u\[\033[0m\]@\h:\W]$ '
    PS2='> '
    PS4='+ '
    PWD=/usr/src/linux
    SHELL=/bin/bash
    SHELLOPTS=braceexpand:hashall:interactive-comments
    SHLVL=2
    TERM=rxvt
    UID=0
    USER=root
    VIM=
    VIMRUNTIME=
    _=LANG

[7.1.] Software (add the output of the ver_linux script here)

    -- Versions installed: (if some fields are empty or looks
    -- unusual then possibly you have very old versions)
    Linux angel 2.2.14 #3 Thu Mar 2 22:30:22 PST 2000 i686 unknown
    Kernel modules 2.3.10
    Gnu C 2.95.2
    Binutils 2.9.5.0.31
    Linux C Library 2.1.3
    Dynamic linker ldd: version 1.9.11
    Procps .
    Mount 2.10f
    Net-tools 2.05
    Console-tools 0.2.3
    Sh-utils 2.0g
    Modules Loaded lockd sunrpc autofs smbfs sb uart401 sound soundcore aic7xxx sd_mod scsi_mod

[7.2.] Processor information (from /proc/cpuinfo):
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 7
    model name : Pentium III (Katmai)
    stepping : 3
    cpu MHz : 451.030220
    cache size : 512 KB
    fdiv_bug : no
    hlt_bug : no
    sep_bug : no
    f00f_bug : no
    coma_bug : no
    fpu : yes
    fpu_exception : yes
    cpuid level : 2
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm
    bogomips : 450.56

[7.3.] Module information (from /proc/modules):
    lockd 30696 1 (autoclean)
    sunrpc 52292 1 (autoclean) [lockd]
    autofs 8928 1 (autoclean)
    smbfs 25360 2 (autoclean)
    sb 32948 0
    uart401 6128 0 [sb]
    sound 56300 0 [sb uart401]
    soundcore 2564 6 [sb sound]
    aic7xxx 105764 2
    sd_mod 15516 2 (autoclean)
    scsi_mod 50456 2 (autoclean) [aic7xxx sd_mod]

[7.4.] SCSI information (from /proc/scsi/scsi)
    Attached devices:
    Host: scsi0 Channel: 00 Id: 00 Lun: 00
      Vendor: TOSHIBA Model: CD-ROM XM-6401TA Rev: 1009
      Type: CD-ROM ANSI SCSI revision: 02
    Host: scsi1 Channel: 00 Id: 00 Lun: 00
      Vendor: SEAGATE Model: ST39175LW Rev: 0001
      Type: Direct-Access ANSI SCSI revision: 02

[7.5.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):

    n/a

[X.] Other notes, patches, fixes, workarounds:

Thank you

-- 
Karsten M. Self (karsten@opensales.com)
    Director of Evangelism, OpenSales, Inc.
        What part of "Gestalt" don't you understand?

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:18 EST