Software raid in 2.2.16 broken?

From: nate@firetrail.com
Date: Tue Jul 18 2000 - 17:43:01 EST


I sent this email to debian-user & debian-devel@lists.debian.org had 1
other user that had this same problem. It is quite severe.

below is the original email about the kernel puking when running e2fsck on
a software raid(level 1) device:

-- begin original email --
 To: Debian user mailinglist <debian-user@lists.debian.org>
 Subject: e2fsck+raid+linux 2.2.16 = crash?
 From: "nate@firetrail.com" <nate@firetrail.com>
 Date: Fri, 14 Jul 2000 15:40:34 -0700 (PDT)
 cc: debian-devel@lists.debian.org

Today...

One of my systems crashed due to SCSI disk errors.

when it came back up it crashed again. so i had to go to the site (2 hour
drive!@) and took a look at it.

it seems when the system tries to run e2fsck on the raid set the system
crashes with the message:

Jul 14 13:36:30 galactica kernel: VFS: grow_buffers: size = 16384
Jul 14 13:36:31 galactica last message repeated 665 times
Jul 14 13:36:31 galactica kernel: 384
Jul 14 13:36:31 galactica kernel: VFS: grow_buffers: size = 16384
Jul 14 13:36:31 galactica last message repeated 814 times
Jul 14 13:36:31 galactica kernel: 384
Jul 14 13:36:31 galactica kernel: VFS: grow_buffers: size = 16384
Jul 14 13:36:31 galactica last message repeated 468 times

it goes on and on and on..(the people on site left it in that state for a
good hour and a half until i got there)

This did not happen with 2.2.15. This is also the first time i have run
e2fsck on the raid drive since upgrading to 2.2.16. As a result i have
been forced to mount the raid array unclean just so i could get the system
up and running so the rest of the staff at the site could go home(its
co-located). I didn't think to reboot to 2.2.15 and run e2fsck on it
before i left i was in a real rush.

System configuration:

Asus P2L97-DS (this board does not have scsi despite the "DS" model code)
Dual P2-233
128MB PC100 SDRAM x 3 (384MB total)
Matrox G100 AGP Video
Adaptec AHA 2940UW PCI
3Com 3C905C w/drivers from www.3com.com
Generic ATAPI 32x CDROM
Root device: IBM DDRS-34560D 4.3GB
Raid system: IBM-DNES-30917OW 9.1GB (x2) Software raid mode 1
Additional Storage: QUANTUM VIKING 4.5WSE 4.5GB (<- the cause of the
initial crash)

Linux Distribution: Debian GNU/Linux 2.1r4
Installed: Mid-march
Installed with kernel: 2.2.14+ow1 (www.openwall.com/linux)
Currently running: 2.2.16+ow1

Raid version: 0.36.4

this happened again and again, and i finally traced it down to this by
disabling automounting of /dev/md0 in /etc/fstab and the system booted
fine. ckraid was run multiple times with no crash(it ran everytime the
system crashed due to other reasons). Once i tried to run e2fsck on
/dev/md0 the errors(above) flooded the screen until i ctrl-alt-del.

I'm not a kernel hacker but i was curious if this is a known issue? i do
read kernel traffic every week but i haven't seen anything specifically
realted to raid mentioned (that i can remember). I think what i will do is
download the data off the raid set, and reformat it so it can be
"clean" again. or reboot to 2.2.15 and run e2fsck on it..

i can try to provide more info if needed, let me know, i just dont want to
have to drive up there again if i can avoid it:)

help!

nate

:::
http://www.aphroland.org/
http://www.linuxpowered.net/
aphro@aphroland.org
3:24pm up 1:30, 1 user, load average: 0.00, 0.01, 0.00

-- end original email --

one user on debian-user pointed me to a patch:

http://www.linux.org.uk/VERSION/relnotes.2216.html

but it appears this code is already in 2.2.16. This does not happen in
2.2.15.

any ideas? btw, im not on the list so if you could please cc: me when
responding.

thanks!

nate

:::
http://www.aphroland.org/
http://www.linuxpowered.net/
aphro@aphroland.org
3:36pm up 23:04, 2 users, load average: 0.04, 0.01, 0.00

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:11 EST