I am using RAID-1 with two 46G IBM ATA-66 hard drives, the RAID-1 is on
/dev/hda4 and /dev/hda4 and is 43510604 blocks in size.
Here's the fdisk output from /dev/hdc, /dev/hda is exactly the same (I used
dd to copy the partition table).
Disk /dev/hdc: 255 heads, 63 sectors, 5606 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hdc1 1 33 265041 82 Linux swap
/dev/hdc2 34 36 24097+ 83 Linux
/dev/hdc3 37 189 1228972+ 83 Linux
/dev/hdc4 190 5606 43512052+ 83 Linux
Here is my current /etc/raidtab:
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
persistent-superblock 1
chunk-size 4k
device /dev/ide/host0/bus0/target0/lun0/part4
raid-disk 0
device /dev/ide/host0/bus1/target0/lun0/part4
raid-disk 1
Here is my df output:
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part3
1228916 952728 276188 78% /
/dev/ide/host0/bus0/target0/lun0/part2
23333 4172 17957 19% /boot
/dev/md/0 43510604 557652 42952952 1% /mnt
I am migrating to devfs as you will notice. I started playing with devfs
AFTER the first time RAID locked up on me so I know it's not caused by devfs.
I have had RAID lockup with and without devfs.
The first time it locked up I could run ps/top/etc but any attempt to sync
etc would hang (D state).
The second time it was more serious. The command:
ls -al /proc/247
would go D state, so ps, top, etc would all hang. Pid 247 was named.
At the time of the hang the only process using the RAID device was a bonnie++
test run on a ReiserFS file system. I doubt that ReiserFS or the drive
hardware was at fault because I spent an entire week running various bonnie++
tests on /dev/hda4 with ReiserFS and ext2 without any problems at all. All
the evidence I have suggests that ReiserFS is not at fault, however I have
CC'd the ReiserFS list (just in case) and I am continuing to run tests.
The bonnie++ command was:
bonnie++ -s 512
It seems that running the bonnie++ command on it's own does not cause a hang,
it's only when I redirect stdout and stderr to files on the same file system
that problems occur. I've just run another test and had it hang badly enough
that I couldn't login via telnet or ssh (the machine has no monitor).
This time it got through the first run but on the second run (with parameters
"-s 1024 -n 60") it hung.
The machine in question is a test machine. So if anyone has any experimental
patches etc they want me to run then please let me know.
Russell Coker
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Fri Jul 07 2000 - 21:00:11 EST