Re: 2.4.0-test2-ac2 with RAID lockup

From: Russell Coker (russell@coker.com.au)
Date: Tue Jul 04 2000 - 00:28:14 EST


I have just repeated this with the Ext2 file system. This time the following
processes were in D state:
  PID TTY STAT TIME COMMAND
    1 ? D 0:09 init [2]
 1220 ? DN 6:04 /usr/sbin/bonnie++ -q -s 1024 -n 60
 1557 ? DN 0:00 /usr/sbin/crack_packer /var/cache/cracklib/cracklib_dict

The commands "sync", "df", and "ls -l /mnt" all became D state too.

So it seems very much a RAID issue not a ReiserFS issue.

Russell Coker

On Sun, 02 Jul 2000, Russell Coker wrote:
>I am using RAID-1 with two 46G IBM ATA-66 hard drives, the RAID-1 is on
>/dev/hda4 and /dev/hda4 and is 43510604 blocks in size.
>
>Here's the fdisk output from /dev/hdc, /dev/hda is exactly the same (I used
>dd to copy the partition table).
>
>Disk /dev/hdc: 255 heads, 63 sectors, 5606 cylinders
>Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
>/dev/hdc1 1 33 265041 82 Linux swap
>/dev/hdc2 34 36 24097+ 83 Linux
>/dev/hdc3 37 189 1228972+ 83 Linux
>/dev/hdc4 190 5606 43512052+ 83 Linux
>
>Here is my current /etc/raidtab:
>raiddev /dev/md0
> raid-level 1
> nr-raid-disks 2
> persistent-superblock 1
> chunk-size 4k
> device /dev/ide/host0/bus0/target0/lun0/part4
> raid-disk 0
> device /dev/ide/host0/bus1/target0/lun0/part4
> raid-disk 1
>
>Here is my df output:
>Filesystem 1k-blocks Used Available Use% Mounted on
>/dev/ide/host0/bus0/target0/lun0/part3
> 1228916 952728 276188 78% /
>/dev/ide/host0/bus0/target0/lun0/part2
> 23333 4172 17957 19% /boot
>/dev/md/0 43510604 557652 42952952 1% /mnt
>
>I am migrating to devfs as you will notice. I started playing with devfs
>AFTER the first time RAID locked up on me so I know it's not caused by devfs.
>I have had RAID lockup with and without devfs.
>
>The first time it locked up I could run ps/top/etc but any attempt to sync
>etc would hang (D state).
>The second time it was more serious. The command:
>ls -al /proc/247
>would go D state, so ps, top, etc would all hang. Pid 247 was named.
>
>At the time of the hang the only process using the RAID device was a bonnie++
>test run on a ReiserFS file system. I doubt that ReiserFS or the drive
>hardware was at fault because I spent an entire week running various bonnie++
>tests on /dev/hda4 with ReiserFS and ext2 without any problems at all. All
>the evidence I have suggests that ReiserFS is not at fault, however I have
>CC'd the ReiserFS list (just in case) and I am continuing to run tests.
>
>The bonnie++ command was:
>bonnie++ -s 512
>
>It seems that running the bonnie++ command on it's own does not cause a hang,
>it's only when I redirect stdout and stderr to files on the same file system
>that problems occur. I've just run another test and had it hang badly enough
>that I couldn't login via telnet or ssh (the machine has no monitor).
>
>This time it got through the first run but on the second run (with parameters
>"-s 1024 -n 60") it hung.
>
>
>The machine in question is a test machine. So if anyone has any experimental
>patches etc they want me to run then please let me know.
>
>
>
>Russell Coker

-- 
My current location - X marks the spot.
X
X
X

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jul 07 2000 - 21:00:14 EST