Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)(fwd)

From: Krzysztof Oledzki
Date: Sun Feb 10 2008 - 12:40:30 EST


Hello,

Eric Moore is on vacation, adding some CCs and TOs.

---------- Forwarded message ----------
Date: Sun, 10 Feb 2008 18:25:39 +0100 (CET)
From: Krzysztof Oledzki <olel@xxxxxx>
To: Maximilian Wilhelm <max@xxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx, Eric.Moore@xxxxxxx
Subject: Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)



On Sun, 10 Feb 2008, Maximilian Wilhelm wrote:

Am Friday, den 8 February hub Maximilian Wilhelm folgendes in die Tasten:

Just noticed that Eric's address was wrong, so resend with corrected Cc.

Eric, my intial report was http://lkml.org/lkml/2008/2/6/300

Am Thursday, den 7 February hub Krzysztof Oledzki folgendes in die Tasten:

Hi!

While installing my new firewall I got the following kernel panic in
the MPT SAS driver which I need for the disks.

The first kernel I bootet was 2.6.23.14 which did panic so I tried a
2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
affected.

Could you please try 2.6.22-stable?

Yes it works :-/

I've put some things which on the web which might be helpful:

dmesg http://files.rfc2324.org/mptsas_panic/2.6.22-dmesg
lspci -v http://files.rfc2324.org/mptsas_panic/2.6.22-lspci-v
.config http://files.rfc2324.org/mptsas_panic/2.6.22-config

I'll search for the last working kernel and try to break it down to a
commit tommorow when I can get a serial console or direct access.
The Java driven console redirection is everything else than fulfilling :-(

It looks *very* similar to my problem:

http://bugzilla.kernel.org/show_bug.cgi?id=9909

It seems to be the same controller:

01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)
Subsystem: Dell Unknown device 1f10
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at ec00 [size=256]
Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fc900000 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Capabilities: [68] Express Endpoint IRQ 0
Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1

I did a git bisect between v2.6.22 v2.6.23 and it seems that
6cb8f91320d3e720351c21741da795fed580b21b
introduced some badness.

Thanks! This was *really* useful!

Now, how about attached patch? Should work with both 2.6.23 and 2.6.24.

Best regards,

Krzysztof Olędzki[SCSI] mpt fusion: Don't oops if NumPhys==0

Don't oops if NumPhys==0, instead return -ENODEV.
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=9909

Signed-off-by: Krzysztof Piotr Oledzki <ole@xxxxxx>

diff -Nur a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
--- a/drivers/message/fusion/mptsas.c 2007-10-09 22:31:38.000000000 +0200
+++ b/drivers/message/fusion/mptsas.c 2008-02-10 17:38:51.000000000 +0100
@@ -1772,6 +1772,11 @@
if (error)
goto out_free_consistent;

+ if (!buffer->NumPhys) {
+ error = -ENODEV;
+ goto out_free_consistent;
+ }
+
/* save config data */
port_info->num_phys = buffer->NumPhys;
port_info->phy_info = kcalloc(port_info->num_phys,