I just experienced a long hang and a lot of unpleasant messages in dmesg
while building randconfig kernels in a loop.
Here's the beginning of the messages (full dmesg output attached)
...
[ 9997.057000] scsi0: Unexpected busfree while idle
[ 9997.057000] SEQADDR == 0x18
[10030.588000] sr 0:0:5:0: Attempting to queue an ABORT message
[10030.588000] CDB: 0x0 0x0 0x0 0x0 0x0 0x0
[10030.588000] scsi0: At time of recovery, card was not paused
[10030.588000] >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
[10030.588000] scsi0: Dumping Card State while idle, at SEQADDR 0x18
[10030.588000] Card was paused
[10030.588000] ACCUM = 0x4, SINDEX = 0x48, DINDEX = 0xe4, ARG_2 = 0x1
[10030.588000] HCNT = 0x0 SCBPTR = 0xf
[10030.588000] SCSIPHASE[0x0] SCSISIGI[0x0] ERROR[0x0]
[10030.588000] SCSIBUSL[0x0] LASTPHASE[0x1] SCSISEQ[0x1a]
[10030.588000] SBLKCTL[0xa] SCSIRATE[0x0] SEQCTL[0x10]
[10030.588000] SEQ_FLAGS[0xc0] SSTAT0[0x0] SSTAT1[0x0]
[10030.588000] SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8]
[10030.588000] SIMODE1[0xa4] SXFRCTL0[0x80] DFCNTRL[0x0]
[10030.588000] DFSTATUS[0x89]
[10030.588000] STACK: 0x0 0x164 0x179 0x17
[10030.588000] SCB count = 40
[10030.588000] Kernel NEXTQSCB = 31
[10030.588000] Card NEXTQSCB = 0
[10030.588000] QINFIFO entries: 0 9 28 30 13 19 2 21 25 20 16 17 18 32 34 11 22 7
[10030.588000] Waiting Queue entries: 15:4
[10030.588000] Disconnected Queue entries:
[10030.588000] QOUTFIFO entries:
[10030.588000] Sequencer Free SCB List: 3 30 19 23 11 16 25 17 18 10 4 6 26 24 7 28 20 27 12 0 2 8 21 14 22 5 1 31 13 29 9
[10030.588000] Sequencer SCB Info:
[10030.588000] 0 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 1 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 2 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 3 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 4 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 5 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 6 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 7 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 8 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 9 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 10 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 11 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 12 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 13 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 14 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 15 SCB_CONTROL[0x40] SCB_SCSIID[0x57] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0x4]
[10030.588000] 16 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 17 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 18 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 19 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 20 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 21 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 22 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 23 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 24 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 25 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 26 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 27 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 28 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 29 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 30 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] 31 SCB_CONTROL[0xe0] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] SCB_TAG[0xff]
[10030.588000] Pending list:
[10030.588000] 7 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 22 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 11 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 34 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 32 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 18 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 17 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 16 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 20 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 25 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 21 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 2 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 19 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 13 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 30 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 28 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 9 SCB_CONTROL[0x60] SCB_SCSIID[0x67] SCB_LUN[0x0]
[10030.588000] 0 SCB_CONTROL[0x40] SCB_SCSIID[0x47] SCB_LUN[0x0]
[10030.588000] 4 SCB_CONTROL[0x40] SCB_SCSIID[0x57] SCB_LUN[0x0]
[10030.588000] Kernel Free SCB list: 24 10 23 5 3 14 33 1 26 15 35 6 29 12 39 27 8 38 37 36
[10030.588000] Untagged Q(4): 0
[10030.588000] Untagged Q(5): 4
[10030.588000]
[10030.588000] <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
[10030.588000] scsi0:0:5:0: Cmd aborted from QINFIFO
[10030.588000] aic7xxx_abort returns 0x2002
[10040.588000] sr 0:0:5:0: Attempting to queue an ABORT message
[10040.588000] CDB: 0x0 0x0 0x0 0x0 0x0 0x0
[10040.588000] scsi0: At time of recovery, card was not paused
...
(see attached file for remaining (long) output)
Here's output from scripts/ver_linux :
juhl@dragon:~/kernel/linux-2.6$ scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
Linux dragon 2.6.22-rc7-g4e99325b #1 SMP PREEMPT Sun Jul 8 21:24:48 CEST 2007 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ AuthenticAMD GNU/Linux
Gnu C 4.1.2
Gnu make 3.81
binutils Binutils
util-linux 2.12r
mount 2.12r
module-init-tools 3.2.2
e2fsprogs 1.39
jfsutils 1.1.11
reiserfsprogs 3.6.19
xfsprogs 2.8.16
pcmciautils 014
quota-tools 3.13.
PPP 2.4.4
Linux C Library > libc.2.5
Dynamic linker (ldd) 2.5
Linux C++ Library 6.0.8
Procps 3.2.7
Net-tools 1.60
Kbd 1.12
oprofile 0.9.2
Sh-utils 6.9
udev 111
Modules Loaded dm_mod snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss agpgart lp snd_emu10k1 snd_rawmidi firmware_class snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem via_rhine snd_hwdep ehci_hcd sg evdev k8temp
and here's /proc/scsi/scsi :
juhl@dragon:~/kernel/linux-2.6$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: PIONEER Model: DVD-ROM DVD-305 Rev: 1.03
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 05 Lun: 00
Vendor: PLEXTOR Model: CD-R PX-W1210S Rev: 1.01
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 06 Lun: 00
Vendor: IBM Model: DDYS-T36950N Rev: S96H
Type: Direct-Access ANSI SCSI revision: 03
uname -a is :
juhl@dragon:~/kernel/linux-2.6$ uname -a
Linux dragon 2.6.22-rc7-g4e99325b #1 SMP PREEMPT Sun Jul 8 21:24:48 CEST 2007 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ AuthenticAMD GNU/Linux
I have a feeling this may be related to my old problem with machine hangs when
building kernels in a loop that I reported back in 2.6.17-2.6.19 times :
http://www.webservertalk.com/archive242-2006-11-1692290.html
The problem mentioned in the thread above I can still reliably reproduce. All
I need to do is use my script to build kernels in a loop and I'll usually get a
hang within 10 kernels (with recent versions) or 100 kernels (with pre-2.6.20
kernels).
One difference between the hangs I've seen previously and this one is that
previously I've always use "nice make -j 3" when building the kernels, but
this time I edited my script to use just "nice make", thus only using a single
core. I suspect this is why my box stayed resonably alive this time - enough
for me to be able to capture some dmesg output - instead of just hanging
completely like it usually does.
If any further information is required, just let me know and I'll be happy to
provide it.
Any patches you want me to test, just send them my way :-)
Getting this finally resolved would be *very* nice :-)
(please note that I'm not subscribed to linux-scsi, so please keep me on Cc)
Attachment:
dmesg-2.6.22-g1db6178c.txt.gz
Description: GNU Zip compressed data