Abit BP6 Lockup Kernel Disassembly

From: Phillip K. Hornung (falcon@pknet.com)
Date: Mon Apr 03 2000 - 01:43:50 EST


I recently got an Abit BP6 running Dual 466MHz Celerons and have been
experiencing the infamous lockup problem.

It plagued me for a few days so I decided to look into the issue. I
don't have much kernel hacking under my belt yet, but I did come up with
something.

Although the lockups seemed fairly random, I am not so sure that they
were, after some time experiencing them. I will give a brief account of
my research as well as the usual system specs. :)

The machine is as stated above along with 128MB of ECC memory, an 8.7GB
IBM Ultrastar Ultra SCSI drive, a cheap FAST-10 NCR host adapter without
BIOS. Due to this, I have to complicate things and boot from an IDE
drive, a 3.4GB UDMA33 Samsung, and a spare 2.1GB Seagate. The IDE
system is completed with a 36x UDMA33 CDROM drive. The video card is an
Ati 3DXrpression+ with 4MB SGRAM and is PCI. The soundcard is a non PnP
ESS 1688 ISA card, and the modem is a standard ISA 33.6 Texas
Instruments faxmodem. I also have a Hauppauge TV capture card with a
Bt878. I haven't had any issues with my hardware in the past, they were
part of a system running on a K6/233. The case in this system is an
Inwin Q500 full tower with about 3 case fans, so it stays relatively
cool. I have no PCI cards in slot 3 (shared with the HPT366 ATA66
controller) and I have only one card in slot 4 since 4 and 5 share
busmastering. I also do not use the HPT366. I have tried the machine
in a few configurations, also changing memory timings and even removing
processors. The BIOS is the QQ-2 beta, but I have the original factory
BIOS on a disk.

During the lockup, I was in X, typing this exact report, and had XFree86
compiling in an Xterm. More on that in a second. I have locked this
machine up in many cases, twice over telnet only, a few times in the
console, and many times in X. The strange thing is that it always locks
solid at one point in the XFree86 4.0 compilation. I was logging the
"make World" output to a file and this was always the last file to
attempt to be compiled (using some crazy optimization flags too.)

gcc -c -O6 -fforce-mem -fforce-addr -finline-functions
-fkeep-inline-functions -ffast-math -fstrength-reduce -fthread-jumps
-fcse-follow-jumps -fcse-skip-blocks -frerun-cse-after-loop
-frerun-loop-opt -fexpensive-optimizations -fschedule-insns2
-fcaller-saves -funroll-loops -fmove-all-movables -fomit-frame-pointer
-mpentiumpro -mcpu=pentiumpro -march=pentiumpro -malign-loops=2
-malign-jumps=2 -malign-functions=2 -ansi -pedantic -I. -I../include
-I../../../../../programs/Xserver/include
-I../../../../../exports/include/X11 -I../../include
-I../../glx -I../../../../.. -I../../../../../exports/include -Dlinux
-D__i386__ -D_POSIX_C_SOURCE=199309L -D_POSIX_SOURCE -D_XOPEN_SOURCE
-D_BSD_SOURCE -D_SVID_SOURCE -D_GNU_SOURCE -DSHAPE -DXINPUT -DXKB
-DLBX -DXAPPGROUP -DXCSECURITY -DTOGCUP -DXF86BIGFONT -DDPMSExtension
-DPIXPRIV -DPANORAMIX -DGCCUSESGAS -DAVOID_GLYPHBLT -DPIXPRIV
-DSINGLEDEPTH -DXFreeXDGA -DXvExtension -DXFree86LOADER -DXFree86Server
-DXF86VIDMODE -DX_BYTE_ORDER=X_LITTLE_ENDIAN -DSMART_SCHEDULE
-DNDEBUG -DFUNCPROTO=15 -DNARROWPROTO -DIN_MODULE -DGLXEXT -DXF86DRI
-DGLX_DIRECT_RENDERING -DGLX_USE_DLOPEN -DGLX_USE_MESA glapinoop.c

If this is at all related to the BP6 lockups, or is some other odd
bug(?) then I can surely reproduce it basically every time. It happened
with a single processor as well as two, and on both 2.2.14 UP and SMP
kernels, and 2.3.99-pre[1,2,3] UP and SMP.
Also locked solid with Mandrake 7.0 stock kernels. The compiler is gcc
version 2.95.2 19991024 (release) as comes supplied with Mandrake 7.

My first recourse was to implement the %eip logging mechanism. I am not
sure how much info this will provide, however, but at least it is
something. I applied this patch by hand to 2.2.14 from Andrea
Arcangeli's IKD(?) patch for 2.2.12 and wrote the columns down on paper
after the machine locked. I saw CPU0 grind to a halt, and then CPU1
about a second later. I ran objdump on the vmlinux of the IKD enabled
2.2.14 kernel and matched up the first and third columns of output (as
sorted by the second and fourth ones) to the (26MB!) disassembled
vmlinux output. Attached is the result, with the lines marked with a
"*" being what was shown on screen at the time of the lockup. If anyone
needs more info feel free to request it. I have attached a dmesg output
from my current 2.3.99-pre3 kernel for hardware reference. Again, I am
not sure that this is what is causing the lockups on the BP6, but it is
just my best observation at this point. I do not have an oops from the
NMI oopser. The machine was never able to generate an oops after
locking up. If anyone has any ideas or patches please respond. I am
subscribed to both lists, or you can surely email me directly.

I sure hope this helps, and if not, then thanks for your time and
bandwidth. It could just be my imagination. ;)

-Phil

[gzipped disassembly attached]
[Dmesg output attached]


klogd 1.3-3, log source = /proc/kmsg started.
Inspecting /boot/System.map-2.3.99-pre3-smp
Loaded 14610 symbols from /boot/System.map-2.3.99-pre3-smp.
Symbols match kernel version 2.3.99.
Loaded 72 symbols from 5 modules.
Linux version 2.3.99-pre3-smp (root@pknet.com) (gcc version 2.95.2 19991024 (release)) #10 SMP Sat Apr 1 08:23:35 PST 2000
e820: 0009fc00 @ 00000000 (usable)
e820: 00000400 @ 0009fc00 (reserved)
e820: 00010000 @ 000f0000 (reserved)
e820: 00001000 @ fec00000 (reserved)
e820: 00001000 @ fee00000 (reserved)
e820: 00010000 @ ffff0000 (reserved)
e820: 07f00000 @ 00100000 (usable)
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
found SMP MP-table at 000f5cf0
hm, page 000f5000 reserved twice.
hm, page 000f6000 reserved twice.
hm, page 000f1000 reserved twice.
hm, page 000f2000 reserved twice.
On node 0 totalpages: 32512
zone(0): 4096 pages.
zone(1): 28416 pages.
zone(2): 0 pages.
Intel MultiProcessor Specification v1.1
Virtual Wire compatibility mode.
OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #0 Pentium(tm) Pro APIC version 17
Floating point unit present.
Machine Exception supported.
64 bit compare & exchange supported.
Internal APIC present.
Bootup CPU
Processor #1 Pentium(tm) Pro APIC version 17
Floating point unit present.
Machine Exception supported.
64 bit compare & exchange supported.
Internal APIC present.
Bus #0 is PCI
Bus #1 is PCI
Bus #2 is ISA
I/O APIC #2 Version 17 at 0xFEC00000.
Processors: 2
mapped APIC to ffffe000 (fee00000)
mapped IOAPIC to ffffd000 (fec00000)
Initializing CPU#0
Detected 467732862 Hz processor.
Console: colour VGA+ 80x50
Calibrating delay loop... 933.89 BogoMIPS
Memory: 124308k/130048k available (1954k kernel code, 5352k reserved, 146k data, 200k init, 0k highmem)
Buffer-cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
VFS: Diskquotas version dquot_6.4.0 initialized
Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.36 (20000221) Richard Gooch (rgooch@atnf.csiro.au)
CPU0: Intel Celeron (Mendocino) stepping 05
per-CPU timeslice cutoff: 357.65 usecs.
Getting VERSION: 40011
Getting VERSION: 40011
Getting LVT0: 700
Getting LVT1: 400
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
CPU present map: 3
Booting processor 1 eip 2000
Setting warm reset code and vector.
1.
2.
3.
Asserting INIT.
Deasserting INIT.
#startup loops: 2.
Sending STARTUP #1.
After apic_write.
Startup point 1.
Waiting for send to finish...
+Sending STARTUP #2.
After apic_write.
Startup point 1.
Waiting for send to finish...
+After Startup.
Before Callout 1.
After Callout 1.
Initializing CPU#1
CPU#1 (phys ID: 1) waiting for CALLOUT
CALLIN, before setup_local_APIC().
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 933.89 BogoMIPS
Stack at about c127ffbc
OK.
CPU1: Intel Celeron (Mendocino) stepping 05
CPU has booted.
Before bogomips.
Total of 2 processors activated (1867.78 BogoMIPS).
Before bogocount - setting activated=1.
Boot done.
ENABLING IO-APIC IRQs
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-5, 2-10, 2-11, 2-16, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=81 pin1=2 pin2=0
activating NMI Watchdog ... done.
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
.... register #01: 00170011
....... : max redirection entries: 0017
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
 NR Log Phy <7>Mask Trig IRR Pol Stat Dest Deli Vect:
 00 000 00 1 0 0 0 0 0 0 00
 01 0FF 0F 0 0 0 0 0 1 1 59
 02 0FF 0F 0 0 0 0 0 1 1 51
 03 0FF 0F 0 0 0 0 0 1 1 61
 04 0FF 0F 0 0 0 0 0 1 1 69
 05 000 00 1 0 0 0 0 0 0 00
 06 0FF 0F 0 0 0 0 0 1 1 71
 07 0FF 0F 0 0 0 0 0 1 1 79
 08 0FF 0F 0 0 0 0 0 1 1 81
 09 0FF 0F 0 0 0 0 0 1 1 89
 0a 000 00 1 0 0 0 0 0 0 00
 0b 000 00 1 0 0 0 0 0 0 00
 0c 0FF 0F 0 0 0 0 0 1 1 91
 0d 000 00 1 0 0 0 0 0 0 00
 0e 0FF 0F 0 0 0 0 0 1 1 99
 0f 0FF 0F 0 0 0 0 0 1 1 A1
 10 000 00 1 0 0 0 0 0 0 00
 11 0FF 0F 1 1 0 1 0 1 1 A9
 12 0FF 0F 1 1 0 1 0 1 1 B1
 13 0FF 0F 1 1 0 1 0 1 1 B9
 14 000 00 1 0 0 0 0 0 0 00
 15 000 00 1 0 0 0 0 0 0 00
 16 000 00 1 0 0 0 0 0 0 00
 17 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 2
IRQ1 -> 1
IRQ3 -> 3
IRQ4 -> 4
IRQ5 -> 19
IRQ6 -> 6
IRQ7 -> 7
IRQ8 -> 8
IRQ9 -> 9
IRQ10 -> 17
IRQ11 -> 18
IRQ12 -> 12
IRQ13 -> 13
IRQ14 -> 14
IRQ15 -> 15
.................................... done.
calibrating APIC timer ...
..... CPU clock speed is 467.6725 MHz.
..... host bus clock speed is 66.8102 MHz.
cpu: 0, clocks: 668102, slice: 222700
CPU0<C0:668096,C:445392,D:4,S:222700,C:668102>
cpu: 1, clocks: 668102, slice: 222700
CPU1<C0:668096,C:222688,D:8,S:222700,C:668102>
checking TSC synchronization across CPUs: passed.
Setting commenced=1, go go go
mtrr: your CPUs had inconsistent fixed MTRR settings
mtrr: probably your BIOS does not setup all CPUs
PCI: PCI BIOS revision 2.10 entry at 0xfb6e0
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Interrupt Routing Table found at 0xc00fd900 [router type 8086/7000]
Limiting direct PCI/PCI transfers.
isapnp: Scanning for Pnp cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.3
Based upon Swansea University Computer Society NET3.039
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 256 buckets, 4Kbytes
TCP: Hash tables configured (established 4096 bind 5461)
Initializing RT netlink socket
P6 Microcode Update Driver v1.03 registered
Starting kswapd v1.6
pty: 256 Unix98 ptys configured
Software Watchdog Timer: 0.05, timer margin: 60 sec
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: registered device at major 7
loop: enabling 8 loop devices
Uniform Multi-Platform E-IDE driver Revision: 6.30
ide: Assuming 40MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev 39
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio
HPT366: onboard version of chipset, pin1=1 pin2=2
HPT366: IDE controller on PCI bus 00 dev 98
HPT366: not 100% native mode: will probe irqs later
    ide2: BM-DMA at 0xd400-0xd407, BIOS settings: hde:pio, hdf:pio
HPT366: IDE controller on PCI bus 00 dev 99
HPT366: not 100% native mode: will probe irqs later
    ide3: BM-DMA at 0xe000-0xe007, BIOS settings: hdg:pio, hdh:pio
hda: SAMSUNG VG33402A (3.40GB), ATA DISK drive
hdb: CD-ROM 36X/AKU, ATAPI CDROM drive
hdc: ST32132A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: SAMSUNG VG33402A (3.40GB), 3244MB w/112kB Cache, CHS=823/128/63, UDMA(33)
hdc: ST32132A, 2015MB w/120kB Cache, CHS=4095/16/63, DMA
hdb: ATAPI 32X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.07
Partition check:
 /dev/ide/host0/bus0/target0/lun0: p1 p2
 /dev/ide/host0/bus1/target0/lun0: [PTBL] [1023/64/63] p1
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
ncr53c8xx: at PCI bus 0, device 9, function 0
ncr53c8xx: 53c810 detected
ncr53c810-0: rev 0x1 on pci bus 0 device 9 function 0 irq 5
ncr53c810-0: ID 7, Fast-10, Parity Checking
ncr53c810-0: restart (scsi reset).
scsi0 : ncr53c8xx - version 3.2g
scsi : 1 host.
  Vendor: IBM Model: DNES-309170 Rev: SA30
  Type: Direct-Access ANSI SCSI revision: 03
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
ncr53c810-0-<0,0>: tagged command queue depth set to 8
scsi : detected 1 SCSI disk total.
ncr53c810-0-<0,*>: FAST-10 SCSI 10.0 MB/s (100 ns, offset 8)
SCSI device sda: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7 GB]
 /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 p4
udf: registering filesystem
Serial driver version 4.93 (2000-03-20) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
ttyS02 at 0x03e8 (irq = 4) is a 16550A
Real Time Clock Driver v1.10b
Non-volatile memory driver v1.0
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 94M
agpgart: Detected Intel 440BX chipset
agpgart: AGP aperture is 64M @ 0xd0000000
devfs: v0.93 (20000306) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x2
Coda Kernel/Venus communications, v4.6.0, braam@cs.cmu.edu
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 200k freed
Adding Swap: 131064k swap-space (priority -1)
CSLIP: code copyright 1989 Regents of the University of California
PPP generic driver version 2.4.1
PPP BSD Compression module registered
PPP Deflate Compression module registered
i2c-core.o: i2c core module
i2c-isa.o version 2.5.0 (20000312)
i2c-core.o: adapter ISA main adapter registered as adapter 0.
i2c-isa.o: ISA bus access for i2c modules initialized.
sensors.o version 2.5.0 (20000312)
w83781d.o version 2.5.0 (20000312)
i2c-core.o: driver W83781D sensor driver registered.
i2c-core.o: client [W83782D chip] registered to adapter [ISA main adapter](pos. 0).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:09 EST