Re: [GIT PATCH] SCSI bug fixes for 2.6.33-rc5

From: James Bottomley
Date: Wed Jan 27 2010 - 18:09:32 EST


On Wed, 2010-01-27 at 22:46 +0000, Alan Cox wrote:
> On Wed, 27 Jan 2010 16:33:29 -0600
> James Bottomley <James.Bottomley@xxxxxxx> wrote:
>
> > On Wed, 2010-01-27 at 22:24 +0000, Alan Cox wrote:
> > > > Penchala Narasimha Reddy Chilakala, ERS-HCLTech (1):
> > > > aacraid: fix File System going into read-only mode
> > >
> > > If aacraid is actually getting patches then see
> > > also http://bugzilla.kernel.org/show_bug.cgi?id=11120 which I found
> > > bugzilla tidyying.
> > >
> > > Contains a patch and test confirmations
> >
> > So the patch it contains is almost certainly wrong in general; Mark was
> > just suggesting it as a trial ... it might work for specific adapter
> > versions but reducing the queue depth by half globally will impact
> > performance noticeably. The bug report does rather sound like cabling
> > issues are leading to a firmware related problem.
>
> Odd then that they worked reliably until the numbers were increased.
> Sorry but having worked on the aacraid for a long time in the past I
> don't buy that explanation. Cabling issues would get logged by the driver
> and the controller. Secondly I don't buy it because the reporter was
> Matthias Ulrichs, who to borrow a hitchhikers term "really knows where his
> towel is".
>
> The patch isn't a halving the queue size - its a returning to the known
> working state from a regression (unfixed).

What regression? The 32 bit queue depth has always been 256 since 2005
(when it was reduced from 512) ... it's never been 127.

> The story is pretty simple
>
> Worked until the kernel changed
> Didn't work with kernel change
> Worked after the kernel changed back.
>
> Kernel's dont go in and fix your cables (much as I wish they did) and
> there are two folks who've actually found the bug report specifically
> confirming it.

But we have two bug reports for all of the aacraids over the last five
years ... the patch would reduce the maximum transfer length from 128k
to 63.5k.

Linux tends to send down the largest transfer size it can, suggesting
that most of the aacraids in the field are happy with 128k.

The maximum transfer length critically impacts I/O throughput and
performance ... I can't just penalise everyone for the sake of two bug
reports.

This value can already be altered on the fly using the

/sys/block/<dev>/queue/max_sectors_kb

Setting that should work for the two reporters without impacting anyone
else.

> When you have a cable fault on the aacraid you can get hangs on crappier
> firmware sets (normally in the BIOS boot though) but it's not dependant
> on queue size - it either works or it doesn't. On good firmware you get
> nice logged errors and it recovers if possible (or multipaths if you've
> got the right bits).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/