Re: FYI: RAID5 unusably unstable through 2.6.14

From: Martin Drab
Date: Thu Feb 02 2006 - 19:56:17 EST


On Thu, 2 Feb 2006, Bill Davidsen wrote:

Just to state clearly in the first place. I've allready solved the problem
by low-level formatting the entire disk that this inconsistent array in
question was part of.

So now everything is back to normal. So unforunatelly I would not be able
to do any more tests on the device in the non-working state.

I mentioned this problem here now just to let you konw that there is such
a problematic Linux behviour (and IMO flawed) in such circumstances, and
that perhaps it may let you think of such situations when doing further
improvements and development in the design of the block device layer (or
wherever the problem may possibly come from).

And I also hope you would understand, that I wouldn't try to create that
state again deliberatelly, since my main system is running on that array
and I wouldn't risk loosing some more data because of this.

However maybe someone perhaps in Adaptec or smewhere else may have
some simillar system at the disposal on which he could allow to experiment
on demand without any serious risk of loosing anything important.

So what I may say is that it is an Adaptec 2410SA with 8205 firmware and
without a battery backup system (which is probably the crutial thing).
And the inconsistency was caused by a MB protection of CPU overheat
shutdown, because I've started the system and booted Linux from the array
in question (which consisted by just one part of one disk), while I've
forgotten to turn on the water cooling of the CPU and northbridge. So
after about 3 minutes the system automatically shut down and Linux was
probably doing some writing in that very moment, which wasn't able to
complete fully (most probably due to the lack of the battery backup system
on the RAID controller). So my guess is that this may be artificially
reproduced when you suddenly switch off a power source of the system while
Linux is doing some writing to the array.

My arrays in particular are:

HDD1 (160 GB): 120 GB Array 1, 40 GB Array 2
HDD2 (120 GB): 120 GB Array 1
HDD3 (120 GB): 120 GB Array 1
HDD4 (120 GB): 120 GB Array 1

Where Array 1 is a RAID 5 array /dev/sdb (labeled as "Data 1"), which
contains just one 330 GB partition /dev/sdb1, and Array 2 is a bootable
(in Adaptec BIOS setup so called) Volume array (i.e. no RAID) /dev/sda
(labeled as "SYSTEM"), which contains /dev/sda1 (NTFS Windows), /dev/sda2
(ext3 Linux), /dev/sda3 (Linux swap). Problem was accessing the whole Array 2.
Array 1 from Linux worked well.

Then, when I tried, the array checking function within the BIOS of the
Adaptec controller found an inconsistency on the position somewhere in the
middle of the /dev/sda, so somewhere within the /dev/sda2 in particular.
So I low-level formatted the entire HDD1, resynced the Array 1 (which is
RAID 5, so no problem) and reinstalled both systems in Array 2, and now it
is all back to normal again.

> Martin Drab wrote:
>
> > Well, I had a similar experience lately with the Adaptec AAC-2410SA RAID 5
> > array. Due to the CPU overheating the whole box was suddenly shot down by
> > the CPU damage protection mechanism. While there is no battery backup on
> > this particular RAID controller, the sudden poweroff caused some very
> > localized inconsistency of one disk in the RAID. The configuration was 1x160
> > GB and 3x120GB, with the 160 GB being split into 120 GB part within the RAID
> > 5 and a 40 GB part as a separate volume. The inconsistency happend in the 40
> > GB part of the 160 GB HDD (as reported by the Adaptec BIOS media check). In
> > particular the problem was in the /dev/sda2 (with /dev/sda being the 40 GB
> > Volume, /dev/sda1 being an NTFS Windows system, and /dev/sda2 being ext3
> > Linux system).
> >
> > Now, what is interesting, is that Linux completely refused any possible
> > access to every byte within /dev/sda, not even dd(1) reading from any
> > position within /dev/sda, not even "fdisk /dev/sda", nothing. Everything
^^^^^^^^^^^^^^
> > ended up with lots of following messages:
> >
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Hardware Error
> > Additional sense: Internal target failure
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector <some sector number>
>
> But /dev/sda is not a Linux filesystem, running fsck on it makes no sense. You
> wanted to run on /dev/sda2.

But I was talking about fdisk(1). This wasn't a problematic behaviour of a
filesystem, but of the entire block device.

> > I've consulted this with Mark Salyzyn, because I thought it was a problem of
> > the AACRAID driver. But I was told, that there is nothing that AACRAID can
> > possibly do about it, and that it is a problem of the upper Linux layers
> > (block device layer?) that are strictly fault intollerant, and thouth the
> > problem was just an inconsistency of one particular localized region inside
> > /dev/sda2, Linux was COMPLETELY UNABLE (!!!!!) to read a single byte from
> > the ENTIRE VOLUME (/dev/sda)!
>
> The obvious test of this "it's not us" statement is to connect that one drive
> to another type controller and see if the upper level code recovers. I'm
> assuming that "sda" is a real drive and not some pseudo-drive which exists
> only in the firmware of the RAID controller.

/dev/sda is a 40 GB RAID array consisting of just one 40 GB part of one
160 GB drive. But it is in fact a virtual device supplied by the
controller. I.e. this 40 GB part of that disc behaves as an entire
harddisk (with it's own MBR etc.). And it is at the end of the drive, so
it may be a little tricky to find the exact position of the partitions
there, but it may be possible.

> That message is curious, did you
> cat /proc/scsi/scsi to see what the system thought was there? Use the infamous
> "cdrecord -scanbus" command?

----------
$ cdrecord -scanbus
Cdrecord-Clone 2.01.01a03-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2005 Jïg Schilling
Note: This version is an unofficial (modified) version with DVD support
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to warly at mandriva.com.
Note: The author of cdrecord should not be bothered with problems in this
version.
Linux sg driver version: 3.5.33
Using libscg version 'schily-0.8'.
scsibus0:
0,0,0 0) 'Adaptec ' 'SYSTEM ' 'V1.0' Disk
0,1,0 1) 'Adaptec ' 'Data 1 ' 'V1.0' Disk
0,2,0 2) *
0,3,0 3) *
0,4,0 4) *
0,5,0 5) *
0,6,0 6) *
0,7,0 7) *

$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Adaptec Model: SYSTEM Rev: V1.0
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: Adaptec Model: Data 1 Rev: V1.0
Type: Direct-Access ANSI SCSI revision: 02
-----------

The 0,0,0 is the /dev/sda. And even though this is now, after low-level
formatting of the previously inconsistent disc, the indications back then
were just the same. Which means every indication behaved as usual. Both
arrays were properly identified. But when I was accessing the inconsistent
one, i.e. /dev/sda, in any way (even just bytes, this has nothing to do
with any filesystems) the error messages mentioned above appeared. I'm not
sure what exactly was generating them, but I've CC'd Mark Salyzyn, maybe
he can explain more to it.

> > And now for the best part: From Windows, I was able to access the ENTIRE
> > VOLUME without the slightest problem. Not only did Windows boot entirely
> > from the /dev/sda1, but using Total Commander's ext3 plugin I was also able
> > to access the ENTIRE /dev/sda2 and at least extract the most important data
> > and configurations, before I did the complete low-level formatting of the
> > drive, which fixed the inconsistency problem.
> >
> > I call this "AN IRONY" to be forced to use Windows to extract information
> > from Linux partition, wouldn't you? ;)
> >
> > (Besides, even GRUB (using BIOS) accessed the /dev/sda without complications
> > - as it was the bootable volume. Only Linux failed here a 100%. :()
>
> From the way you say sda when you presumably mean sda1 or sda2 it's not clear
> if you don't understand the difference between drive and partition access or
> are just so pissed off you are not taking the time to state the distinction
> clearly.

No, I understand the differences very clearly. But maybe I was just
unclear in my expressions (for which I appologize). What I mean is that
the problem was with the entire RAID array /dev/sda. So whenever ANY
access to ANY part of /dev/sda, which of course also includes accesses to
all of /dev/sda1, /dev/sda2, and /dev/sda3, the error messages appeared
and no access was performed. That includes even accesses like this
"dd if=/dev/sda of=/dev/null bs=512 count=1" and any other possible
accesses. So the problem was with the entire device /dev/sda.

> There was a problem with recovery from errors in RAID-5 which is addressed by
> recent changes to fail a sector, try rewriting it, etc.

Maybe this was again my bad explanation, but this wasn't a problem of a
RAID 5 array, and much less of a software array. Adaptec 2410SA is a
4-channel HW SATA-I RAID controller.

> I would have to read linux-raid archives to explain it, so I'll stop
> with the overview. I don't think that's the issue here, you're using a
> RAID controller rather than the software RAID, so it should not apply.

Yes, exactly. And again, I've solved it by lowlevel formatting.

> I assume that the problem is gone, so we can't do any more analysis after the
> fact.

Unfortunatelly, yes. But I've described above how did it happen, so maybe
someone in Adaptec would be able to reproduce, Mark?

Martin