Re: [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas

From: Nicholas A. Bellinger
Date: Fri Sep 04 2015 - 14:48:10 EST


Hi Sreekanth,

(Adding MKP CC')

On Sun, 2015-08-30 at 07:54 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
>
> Hi all,
>
> This series is a mpt3sas LLD forward port of Calvin Owens' for-next
> reference counting bugfix series for mpt2sas LLD code.
>
> His latest patch series can be found here:
>
> [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
> http://marc.info/?l=linux-scsi&m=143951695904115&w=2
>
> The differences between mpt2sas + mpt3sas in this area are very
> small, and the changes are required to address a class of long
> standing NULL pointer dereference bugs through-out both LLDs.
>
> With Calvin's changes in place for mpt3sas, the original NULL
> pointer dereference OOPsen during probe with a failed HDD appear
> to be resolved, and so far no new regressions have been reported
> with -v1 series code.
>
> The -v1 series code for mpt3sas has been tested on v3.14.47 with
> 40x SAS3008 HBAs, with three preceeding upstream mpt3sas patches:
>
> 4dc06fd mpt3sas: delay scsi_add_host call to work with scsi-mq
> 35b6236 mpt3sas: combine fw_event_work and its event_data
> 62c4da4 mpt3sas: correct scsi_{target,device} hostdata allocation
>
> Please review.
>
> --nab
>
> -v2 changes:
> - Fix _scsih_block_io_device() v4.3-rc0 brekage
> - Fix mpt3sas_transport_port_add() v4.3-rc0 breakage
>
> Nicholas Bellinger (2):
> mpt3sas: Refcount sas_device objects and fix unsafe list usage
> mpt3sas: Refcount fw_events and fix unsafe list usage
>
> drivers/scsi/mpt3sas/mpt3sas_base.h | 25 +-
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 595 ++++++++++++++++++++++---------
> drivers/scsi/mpt3sas/mpt3sas_transport.c | 18 +-
> 3 files changed, 458 insertions(+), 180 deletions(-)
>

Have you been able to verify this port of Calvin's changes to mpt3sas
code..?

On my end, we're up to 60x machines with SAS3008 HBAs w/ 12 HDDs each
running the original -v1 series on v3.14.47 code.

They have been running continuous I/O stress tests and node reboot
tests, and after two weeks we've still not run into any of the original
NULL pointer dereference OOPsen, or any other new regressions.

So at this point I think they are looking safe to merge for -rc1.

Can you please give your Reviewed-by -> Acked-by..?

Thank you,

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/