Re: KMSGDUMP: dump kernel messages to a diskette

Matt Robinson (yakker@cthulhu.engr.sgi.com)
Thu, 8 Jul 1999 16:26:03 -0700 (PDT)


On Thu, 8 Jul 1999, Dan Hollis wrote:
|>On Thu, 8 Jul 1999, Willy Tarreau wrote:
|>Here is my scenario:
|>
|>- PC locks up, HARD, due to interaction between PCI cards and buggy
|> drivers. no oops is possible since the harware is locked up.
|>- Press PC "red reset button" for hardware reset.
|>- PC enters minimal dump code using '286 return to real mode' trick.
|>- dump code would verify md5 of itself. if md5 is not valid, it asks
|> user if they really want to try continue dumping but warns them that
|> it might corrupt data. on the other hand if md5 is valid, it
|> initializes hardware minimally if needed, and loads lilo.
|>- lilo dumps memory to swap paritition, then completely reboots the PC.
|>- lilo loads linux.
|>- 'swapon' sees a memory dump in the swap, and writes that to
|> /var/crash/crashdump-bla-bla (of course, this dir and all files are
|> mode 600 and owned by root).
|>
|>-Dan

Hi, Dan.

I'm currently working on a kernel patch that now dumps kernel virtual
memory to disk (such as the swap partition). We also have a utility
we are working on called 'lcrash', similar to 'icrash' (IRIX Crash)
that allows you to perform post-analysis of a system failure. It
would also be available in the patch.

If you want more information on it, send me some E-mail. I'm working
on web pages now that display output for what 'lcrash' reports on a
given virtual memory dump, as well as information on our project. I've
also included some example output from a working version we have at the
bottom of this E-mail. It shows the log_buf[] (with the 'stat' command),
the set of tasks on the system with 'task', an example of full output
for a given task, a dump of memory (with 'od', or 'dump'), and the
'dis'assemble functionality. There are lots more commands available.

I'd be very interested in hearing what you'd like to see in terms of
features and functionality both on the kernel dumping end as well as
the post-dump analysis end.

--Matt

-----------------------------------------------------------------------------
# lcrash map.1 vmcore.1
corefile = vmcore.1, namelist = map.1, outfile = stdout

Please wait..............

>> task
ACTIVE TASKS:

ADDR UID PID PPID STATE PRI FLAGS MM NAME
==============================================================================
c0284000 0 0 0 0 0 0 c025e0e0 swapper
c009c000 0 1 0 1 20 100 c028d060 init
c000e000 0 2 1 1 20 40 c025e0e0 kflushd
c000c000 0 3 1 1 20 840 c025e0e0 kpiod
c000a000 0 4 1 1 20 840 c025e0e0 kswapd
c7970000 1 232 1 1 20 140 c028d2e0 portmap
c77b4000 0 255 1 1 20 140 c028d4e0 syslogd
c778e000 0 266 1 1 20 140 c028d460 klogd
c770a000 0 280 1 1 20 40 c028d260 atd
c76ca000 0 294 1 1 20 40 c028d3e0 crond
c7784000 0 308 1 1 20 140 c028d360 inetd
c75f2000 0 329 1 1 20 140 c028d560 sendmail
c7458000 100 350 1 1 20 40 c028d6e0 xfs
c7d36000 0 389 1 1 20 100 c028d160 login
c7c5e000 0 390 1 1 20 100 c028d5e0 mingetty
c7df8000 0 391 1 1 20 100 c028d660 mingetty
c744e000 0 392 1 1 20 100 c028d1e0 mingetty
c75c2000 0 393 1 1 20 100 c028d760 mingetty
c7578000 0 394 1 1 20 100 c028d7e0 mingetty
c7448000 0 397 1 1 20 140 c028d960 update
c672c000 0 542 1 1 20 40 c028d9e0 named
c55a6000 0 556 1 1 20 140 c028d0e0 routed
c6902000 0 573 1 1 20 140 c028dc60 rpc.statd
c5080000 0 584 1 1 20 40 c028dbe0 rpc.rquotad
c5916000 0 595 1 1 20 40 c028dce0 rpc.mountd
c5a7e000 0 619 1 1 20 40 c028db60 gpm
c6184000 0 656 389 1 20 100 c028dee0 tcsh
c6c9e000 0 668 656 0 20 0 c028d860 crashdump
==============================================================================
28 active task structs found

>> trace 0xc7c5e000
================================================================
STACK TRACE FOR TASK: 0xc7c5e000 (mingetty)

0 schedule+845 [0xc0111e75]
1 schedule_timeout+18 [0xc0111a76]
2 read_chan+821 [0xc01a1249]
3 tty_read+172 [0xc019d5c0]
4 sys_read+198 [0xc012698a]
5 system_call+45 [0xc0107aa9]
================================================================

>> trace -f 0xc7c5e000
================================================================
STACK TRACE FOR TASK: 0xc7c5e000 (mingetty)

0 schedule+845 [0xc0111e75]

RA=0xc0111a7b, SP=0xc7c5feb8, FP=0xc7c5fef0, SIZE=60

c7c5feb8: c7c5feec 72756569 000a5a79 c7c5e000
c7c5fec8: 7fffffff c741f968 c741f000 00000001
c7c5fed8: c0284000 00000037 c7c5e000 00000001
c7c5fee8: c0293020 c7c5ff10 c0111a7b

1 schedule_timeout+18 [0xc0111a76]

RA=0xc01a124e, SP=0xc7c5fef4, FP=0xc7c5ff14, SIZE=36

c7c5fef4: 00000008 c741f968 c741f000 0000000a
c7c5ff04: 00000207 00000000 c7c5ff70 c7c5ff70
c7c5ff14: c01a124e

2 read_chan+821 [0xc01a1249]

RA=0xc019d5c2, SP=0xc7c5ff18, FP=0xc7c5ff74, SIZE=96

c7c5ff18: c741f000 c02912c0 c741d210 c01a16d8
c7c5ff28: 00000246 00000008 08049897 c741f000
c7c5ff38: c7c5ff4c c7c5ff68 c7c5e000 c741fbd4
c7c5ff48: c7c5e000 c7c5ff68 7fffffff 00000000
c7c5ff58: 00000000 00000000 c7c5e000 bffffe0b
c7c5ff68: c7c5e000 c741f968 c7c5ff94 c019d5c2

3 tty_read+172 [0xc019d5c0]

RA=0xc012698c, SP=0xc7c5ff78, FP=0xc7c5ff98, SIZE=36

c7c5ff78: c741f000 c02912c0 bffffe0b 00000001
c7c5ff88: c02912c0 ffffffea 00000000 c7c5ffbc
c7c5ff98: c012698c

4 sys_read+198 [0xc012698a]

RA=0xc0107ab0, SP=0xc7c5ff9c, FP=0xc7c5ffc0, SIZE=40

c7c5ff9c: c02912c0 bffffe0b 00000001 c02912d4
c7c5ffac: c7c5e000 0804ad80 bffffe0b c7c5e000
c7c5ffbc: bffffe0c c0107ab0

5 system_call+45 [0xc0107aa9]

RA=0x400bc534, SP=0xc7c5ffc4, FP=0xc7c5ffec, SIZE=44

c7c5ffc4: 00000000 bffffe0b 00000001 0804ad80
c7c5ffd4: bffffe0b bffffe0c 00000003 0000002b
c7c5ffe4: 0000002b 00000003 400bc534

================================================================

>> task -n -f 0xc7c5e000
ADDR UID PID PPID STATE PRI FLAGS MM NAME
==============================================================================
c7c5e000 0 390 1 1 20 100 c028d5e0 mingetty

TSS:
ESP0:0xc7c60000, ESP:0xc7c5feb8, EIP:0xc0111e75, EBP:0x0

ADDR START_CODE END_CODE START_DATA END_DATA START_STACK TOTAL_VM
-------------------------------------------------------------------------
c028d5e0 8048000 80499b5 0 804ab54 bffffe60 265

ADDR VM_START VM_END VM_OFFSET VM_FILE VM_PTE VM_FLAGS VM_NEXT
------------------------------------------------------------------------
c028ba60 8048000 804a000 0 c7920920 0 1875 c028bfa0
c028bfa0 804a000 804b000 4096 c7920920 0 1873 c787d540
c787d540 804b000 804c000 0 0 0 77 c028bca0
c028bca0 40000000 40012000 0 c7d52f00 0 875 c028bf60
c028bf60 40012000 40013000 69632 c7d52f00 0 873 c028bf20
c028bf20 40013000 40014000 0 0 0 77 c787db80
c787db80 40015000 40016000 4096 0 0 73 c787d980
c787d980 4001a000 40100000 0 c7920c80 0 75 c787da40
c787da40 40100000 40105000 937984 c7920c80 0 73 c787d440
c787d440 40105000 40108000 0 0 0 73 c028bda0
c028bda0 bfffe000 c0000000 -4096 0 0 177 0
------------------------------------------------------------------------

==============================================================================
1 active task struct found

>> od c7920920 20
0xc7920920: c0291320 c7d52f00 c7cb6020 c026b3c0 : .)../.. `....&.
0xc7920930: 00000001 00000000 00000000 00000002 : ................
0xc7920940: 00000000 00000000 00000000 00000000 : ................
0xc7920950: 00000000 00000000 00000000 00000000 : ................
0xc7920960: 00000000 00000000 00000000 00000000 : ................

>> dis dump_execute 10
0xc011b0a0 <dump_execute>: pushl %ebp
0xc011b0a1 <dump_execute+1>: movl %esp,%ebp
0xc011b0a3 <dump_execute+3>: cmpl $0x0,0xc0269328
0xc011b0aa <dump_execute+10>: jne 0xc011b0b8 <dump_execute+24>
0xc011b0ac <dump_execute+12>: pushl $0xc01f5720
0xc011b0b1 <dump_execute+17>: call 0xc0114c80 <printk>
0xc011b0b6 <dump_execute+22>: jmp 0xc011b0f0 <dump_execute+80>
0xc011b0b8 <dump_execute+24>: xorl %eax,%eax
0xc011b0ba <dump_execute+26>: movw 0xc02ac890,%ax
0xc011b0c0 <dump_execute+32>: pushl %eax

>> stat
sysname : Linux
nodename : isdn-vivarin.corp.sgi.com
release : 2.2.4
version : #104 SMP Tue Jul 6 15:59:53 PDT 1999
machine : i686
domainname :

LOG_BUF:

<4>Linux version 2.2.4 (root@isdn-vivarin.corp.sgi.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #104 SMP Tue Jul 6 15:59:53 PDT 1999
<4>Intel MultiProcessor Specification v1.4
<4> Virtual Wire compatibility mode.
<4>OEM ID: HP Product ID: XU/XW APIC at: 0xFEE00000
<4>Processor #1 Pentium(tm) Pro APIC version 17
<4>I/O APIC #0 Version 17 at 0xFEC00000.
<4>Processors: 1
<4>mapped APIC to ffffe000 (fee00000)
<4>mapped IOAPIC to ffffd000 (fec00000)
<4>Detected 300688466 Hz processor.
<4>Console: colour VGA+ 80x25
<4>Calibrating delay loop... 299.83 BogoMIPS
<4>Memory: 127484k/131072k available (1196k kernel code, 420k reserved, 1920k data, 52k init)
<6>Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
<6>Checking 'hlt' instruction... OK.
<4>POSIX conformance testing by UNIFIX
<4>per-CPU timeslice cutoff: 100.22 usecs.
<4>CPU1: Intel Pentium II (Klamath) stepping 03
<4>calibrating APIC timer ...
<4>..... CPU clock speed is 300.6945 MHz.
<4>..... system bus clock speed is 66.8208 MHz.
<3>Error: only one processor found.
<4>enabling symmetric IO mode... ...done.
<4>ENABLING IO-APIC IRQs
<4>init IO_APIC IRQs
<4> IO-APIC pin 0, 9, 10, 17, 20, 21, 22, 23 not connected.
<4>number of MP IRQ sources: 20.
<4>number of IO-APIC registers: 24.
<4>testing the IO APIC.......................
<4>.... register #00: 00000000
<4>....... : physical APIC id: 00
<4>.... register #01: 00170011
<4>....... : max redirection entries: 0017
<4>....... : IO APIC version: 0011
<4>.... register #02: 00000000
<4>....... : arbitration: 00
<4>.... IRQ redirection table:
<4> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
<4> 00 000 00 1 0 0 0 0 0 0 00
<4> 01 000 00 0 0 0 0 0 1 1 59
<4> 02 0FF 0F 0 0 0 0 0 1 1 51
<4> 03 000 00 0 0 0 0 0 1 1 61
<4> 04 000 00 0 0 0 0 0 1 1 69
<4> 05 000 00 0 0 0 0 0 1 1 71
<4> 06 000 00 0 0 0 0 0 1 1 79
<4> 07 000 00 0 0 0 0 0 1 1 81
<4> 08 000 00 0 0 0 0 0 1 1 89
<4> 09 000 00 1 0 0 0 0 0 0 00
<4> 0a 000 00 1 0 0 0 0 0 0 00
<4> 0b 000 00 0 0 0 0 0 1 1 91
<4> 0c 000 00 0 0 0 0 0 1 1 99
<4> 0d 000 00 1 0 0 0 0 0 0 00
<4> 0e 000 00 0 0 0 0 0 1 1 A1
<4> 0f 000 00 0 0 0 0 0 1 1 A9
<4> 10 0FF 0F 1 1 0 1 0 1 1 B1
<4> 11 000 00 1 0 0 0 0 0 0 00
<4> 12 0FF 0F 1 1 0 1 0 1 1 B9
<4> 13 0FF 0F 1 1 0 1 0 1 1 C1
<4> 14 000 00 1 0 0 0 0 0 0 00
<4> 15 000 00 1 0 0 0 0 0 0 00
<4> 16 000 00 1 0 0 0 0 0 0 00
<4> 17 000 00 1 0 0 0 0 0 0 00
<7>IRQ to pin mappings:
<7>IRQ0 -> 2
<7>IRQ1 -> 1
<7>IRQ3 -> 3
<7>IRQ4 -> 4
<7>IRQ5 -> 5
<7>IRQ6 -> 6
<7>IRQ7 -> 7
<7>IRQ8 -> 8
<7>IRQ11 -> 11
<7>IRQ12 -> 12
<7>IRQ13 -> 13
<7>IRQ14 -> 14
<7>IRQ15 -> 15
<7>IRQ16 -> 16
<7>IRQ18 -> 18
<7>IRQ19 -> 19
<4>.................................... done.
<4>PCI: PCI BIOS revision 2.10 entry at 0xfd997
<4>PCI: Using configuration type 1
<4>PCI: Probing PCI hardware
<4>PCI->APIC IRQ transform: (B0,I7,P3) -> 19
<4>PCI->APIC IRQ transform: (B0,I8,P0) -> 16
<4>PCI->APIC IRQ transform: (B0,I9,P0) -> 19
<4>PCI->APIC IRQ transform: (B0,I17,P0) -> 18
<4>PCI->APIC IRQ transform: (B0,I18,P0) -> 16
<4>PCI->APIC IRQ transform: (B1,I0,P0) -> 18
<6>Linux NET4.0 for Linux 2.2
<6>Based upon Swansea University Computer Society NET3.039
<6>NET4: Unix domain sockets 1.0 for Linux NET4.0.
<6>NET4: Linux TCP/IP 1.0 for NET4.0
<6>IP Protocols: ICMP, UDP, TCP, IGMP
<6>Linux IP multicast router 0.06 plus PIM-SM
<4>Starting kswapd v 1.3
<6>Detected PS/2 Mouse Port.
<6>Serial driver version 4.27 with no serial options enabled
<6>ttyS00 at 0x03f8 (irq = 4) is a 16550A
<6>ttyS01 at 0x02f8 (irq = 3) is a 16550A
<4>pty: 256 Unix98 ptys configured
<4>PIIX4: IDE controller on PCI bus 00 dev 39
<4>PIIX4: not 100% native mode: will probe irqs later
<4> ide0: BM-DMA at 0xfcb0-0xfcb7, BIOS settings: hda:pio, hdb:pio
<4>hda: MATSHITA CR-585, ATAPI CDROM drive
<4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
<4>hda: ATAPI 24X CD-ROM drive, 128kB Cache
<6>Uniform CDROM driver Revision: 2.54
<6>Floppy drive(s): fd0 is 1.44M
<6>FDC 0 is a National Semiconductor PC87306
<6>(scsi0) <Adaptec AHA-398X Ultra SCSI host adapter> found at PCI 8/0
<6>(scsi0) Wide Channel B, SCSI ID=15, 16/255 SCBs
<6>(scsi0) Warning - detected auto-termination
<6>(scsi0) Please verify driver detected settings are correct.
<6>(scsi0) If not, then please properly set the device termination
<6>(scsi0) in the Adaptec SCSI BIOS by hitting CTRL-A when prompted
<6>(scsi0) during machine bootup.
<6>(scsi0) Cables present (Int-50 NO, Int-68 NO, Ext-68 NO)
<6>(scsi0) Downloading sequencer code... 419 instructions downloaded
<6>(scsi1) <Adaptec AIC-7860 Ultra SCSI host adapter> found at PCI 9/0
<6>(scsi1) Narrow Channel, SCSI ID=7, 3/255 SCBs
<6>(scsi1) Downloading sequencer code... 419 instructions downloaded
<4>NCR53c406a: no available ports found
<4>scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.10/3.2.4
<4> <Adaptec AHA-398X Ultra SCSI host adapter>
<4>scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.10/3.2.4
<4> <Adaptec AIC-7860 Ultra SCSI host adapter>
<4>scsi : 2 hosts.
<4> Vendor: SEAGATE Model: ST34501W Rev: 8301
<4> Type: Direct-Access ANSI SCSI revision: 02
<4>Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
<6>(scsi0:0:0:0) Synchronous at 7.2 Mbyte/sec, offset 8.
<4>scsi : detected 1 SCSI disk total.
<4>SCSI device sda: hdwr sector= 512 bytes. Sectors= 8887200 [4339 MB] [4.3 GB]
<6>3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html
<4>pcnet32.c: PCI bios is present, checking for devices...
<4>Found PCnet/PCI at 0xfcc0, irq 18.
<4>eth0: PCnet/FAST 79C971 at 0xfcc0, 00 60 b0 a2 f6 28 assigned IRQ 18.
<4>pcnet32.c:v1.11 17.1.99 tsbogend@alpha.franken.de
<4>Partition check:
<4> sda: sda1 sda2 < sda5 sda6 >
<4>VFS: Mounted root (ext2 filesystem) readonly.
<4>Dump device initialized: 0x805
<4>Freeing unused kernel memory: 52k freed
<4>Dumping to device 0x805 (sd(8,5)) ...
<4>Initializing dump process ...
<4>Writing dump header ...
<4>Writing dump pages ...

>> quit
-----------------------------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/