Re: AACRAID failure with 2.6.13-rc1

From: Andrew Morton
Date: Fri Jul 29 2005 - 13:09:46 EST


Martin Drab <drab@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > > [ 278.732829] scsi0 (0:0): rejecting I/O to offline device
> > > [ 278.735954] Buffer I/O error on device sda2, logical block 491840
> > > [ 278.739147] lost page write due to I/O error on sda2
> > > [ 278.742389] scsi0 (0:0): rejecting I/O to offline device
> > > [ 278.745618] Buffer I/O error on device sda2, logical block 950284
> > > [ 278.748911] lost page write due to I/O error on sda2
> > > [ 278.752238] Buffer I/O error on device sda2, logical block 950285
> > > [ 278.755614] lost page write due to I/O error on sda2
> > > [ 278.759009] scsi0 (0:0): rejecting I/O to offline device
> > > [ 278.762408] Buffer I/O error on device sda2, logical block 950287
> > > [ 278.765855] lost page write due to I/O error on sda2
> > > [ 278.769318] scsi0 (0:0): rejecting I/O to offline device
> > > ... last message repeats about 45-times ...
> > > [ 347.564676] EXT3-fs error (device sda2): ext3_find_entry: reading directory #544057 offset 0
> > > ... here the log ends, nothing else happens then, since nothing is working when / is inaccessible ... :(
> >
> > Martin, is this problem still present in 2.6.13-rc4?
> >
> > If so, please cc linux-kernel on the reply, thanks.
> >
> > It would also be useful if you could try reverting that aacraid patch, see
> > if that helps.
>
> Hi, Andrew!
>
> The thing is still not fixed in 2.6.13-rc4. I had a long discussion about
> this with Mark Salyczyn and Mark Haverkamp. We came out with a temporary
> workaround, which was to set the AAC_MAX_32BIT_SGBCOUNT in aacraid.h to
> 512 instead of 8192. The patch for this was presented 8.7.2005 on
> linux-scsi list by Mark Haverkamp. However this constant solution may (?)
> have some performance impact on the configurations which are capable of
> delivering a better performance (with respect to this constant at hand).
>
> IMHO the real solution to this problem is the new Adaptec variant of
> aacraid driver which uses the 'new comm' technology to negotiate all these
> essential parameters directly with the hardware instead of relying on some
> preset constants. Mark Salyzyn has the patches prepared in his patch
> queue, and I vote for pushing it into the mainline ASAP.

ah, thanks.

A temporary workaround which might affct performance sounds better than a
dead box though.

Mark, do you think that many systems are likely to be affected this way?
Do you think we should do something temporary for 2.6.13?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/