For me, my system became unusable at 2.1.85. Prior to that kernel
(I am currently running 2.1.84) I have witnessed processes stuck in
the "D" state, but they were rare. With the 2.1.85 kernel, just
running my nightly dump would leave processes stuck. The problem
appears to be exacerbated by the performance increases provided by
Ingo's IO-APIC code starting in 2.1.85. Note: the first kernel that
I saw a stuck "D" process in was 2.1.36 during an mkisofs run.
(Apparently, it got stuck on the incorporation of a large,
approximately 30MB if memory serves, file. When I removed the file,
I was able to get the mkisofs run to complete.)
The stuck processes appear to occur most frequently when there is
significant SCSI I/O within the process. I am able to readily
recreate the problem by performing a level 0 dump to SCSI tape of a
partition and doing an fsck on another unmounted partition. Either
the fsck or the dump will hang. Note, over time, especially with the
2.1.85 and 2.1.86 kernels, I have seen other processes hang. It is
especially bad when update (bdflush) hangs. The system will go into
a slow spiral and eventually crash.
I tried the 2.1.91 kernel on Saturday, hoping the new SCSI spinlock
code would help, and still experienced the problem. I tried to
rebuild the kernel with the __asm__ "cpuid" hack but, on reboot, the
system hung hard. It took me over an hour to fsck my partitions.
I'm skating the razor's edge when the system crashes as, even with
the 2.1.84 kernel, there is a very good likelyhood that the boot fsck
on an unclean partition will hang. It took 4 reboots to make it
through last time! I didn't have the time (or heart ;) to try to
reboot the modified kernel again.
My system is on the internet and I am more than willing to let
someone more knowledgeable than I on the box to investigate. BTW, my
system has the tape drive on a buslogic 946 and two micropolis 8.7G
drives in 10 md striped partitions on a second buslogic 958.
Some other possible information, I have received messages from
others that a) upgrading the adaptec driver helped their problem (for
a change, the adaptec appears to be the better card ;) and b) that
turning off read-ahead on the drives helped. I have not tried either
as a) I have buslogic cards, and b) I'm not real sure how to do this.
Anything I can do to help, please let me know. I _really_ want to start using Ingo's code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu