RE: mptsas hangs caused by ATA pass-through explained

From: Desai, Kashyap
Date: Thu Jun 03 2010 - 12:49:58 EST




> -----Original Message-----
> From: ClÃudio Martins [mailto:ctpm@xxxxxxxxxx]
> Sent: Wednesday, June 02, 2010 1:14 AM
> To: Ryan Kuester
> Cc: Moore, Eric; Desai, Kashyap; DL-MPT Fusion Linux; Support,
> Software; linux-scsi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: mptsas hangs caused by ATA pass-through explained
>
>
> On Mon, 26 Apr 2010 18:11:54 -0500 Ryan Kuester <rkuester@xxxxxxxxxx>
> wrote:
> > I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
> > pass-through commands, in particular by smartctl.
> >
> > First, my version of the symptoms. On an LSI SAS1068E B3 HBA running
> > 01.29.00.00 firmware, with SATA disks, and with smartd running, I'm
> seeing
> > occasional task, bus, and host resets, some of which lead to hard
> faults of
> > the HBA requiring a reboot. Abusively looping the smartctl command,
> >
> > # while true; do smartctl -a /dev/sdb > /dev/null; done
> >
> > dramatically increases the frequency of these failures to nearly one
> per
> > minute. A high IO load through the HBA while looping smartctl seems
> to
> > improve the chance of a full scsi host reset or a non-recoverable
> hang.
> >
> > I reduced what smartctl was doing down to a simple test case which
> > causes the hang with a single IO when pointed at the sd interface.
> See
> > the code at the bottom of this e-mail. It uses an SG_IO ioctl to
> issue
> > a single pass-through ATA identify device command. If the buffer
> > userspace gives for the read data has certain alignments, the task is
> > issued to the HBA but the HBA fails to respond. If run against the
> sg
> > interface, neither the test code nor smartctl causes a hang.
> >
> > sd and sg handle the SG_IO ioctl slightly differently. Unless you
> > specifically set a flag to do direct IO, sg passes a buffer of its
> own,
> > which is page-aligned, to the block layer and later copies the result
> > into the userspace buffer regardless of its alignment. sd, on the
> other
> > hand, always does direct IO unless the userspace buffer fails an
> > alignment test at block/blk-map.c line 57, in which case a page-
> aligned
> > buffer is created and used for the transfer.
> >
> > The alignment test currently checks for word-alignment, the default
> > setup by scsi_lib.c; therefore, userspace buffers of almost any
> > alignment are given directly to the HBA as DMA targets. The LSI 1068
> > hardware doesn't seem to like at least a couple of the alignments
> which
> > cross a page boundary (see the test code below). Curiously, many
> > page-boundary-crossing alignments do work just fine.
> >
> > So, either the hardware has an bug handling certain alignments or the
> > hardware has a stricter alignment requirement than the driver is
> > advertising. If stricter alignment is required, then in no case
> should
> > misaligned buffers from userspace be allowed through without being
> > bounced or at least causing an error to be returned.
> >
> > It seems the mptsas driver could use blk_queue_dma_alignment() to
> advertise
> > a stricter alignment requirement. If it does, sd does the right
> thing and
> > bounces misaligned buffers (see block/blk-map.c line 57). The
> following
> > patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the
> wrong
> > place for this code, but it gets my idea across.
> >
> > diff --git a/drivers/message/fusion/mptscsih.c
> b/drivers/message/fusion/mptscsih.c
> > index 6796597..1e034ad 100644
> > --- a/drivers/message/fusion/mptscsih.c
> > +++ b/drivers/message/fusion/mptscsih.c
> > @@ -2450,6 +2450,8 @@ mptscsih_slave_configure(struct scsi_device
> *sdev)
> > ioc->name,sdev->tagged_supported, sdev->simple_tags,
> > sdev->ordered_tags));
> >
> > + blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
> > +
> > return 0;
> > }
> >

I have also verified this patch + also done code review with our developers including Eric Moore.
Please consider this patch as an ACKed patch and schedule it for next upstream release.

Thanks,
Kashyap


>
> Hello,
>
> I have tested v2.6.34 on a box with 16 SATA disks attached to a
> LSISAS1068E (through a port expander), with and without this patch:
>
> With vanilla 2.6.34 I can reliably reproduce controller timeouts both
> with the example code provided by Ryan and with a simple loop like:
>
> while : ; do for d in `ls /sys/block/ | grep sd` ; do smartctl -a
> /dev/$d ; done ; done
>
> The result are controller timeouts with the following kind of kernel
> messages:
>
> mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00
> 01 00 00 00 00 00 00 00 ec 00
> mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO
> Executed}, SubCode(0x0000)
> mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
> mptscsih: ioc0: attempting target reset! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00
> 01 00 00 00 00 00 00 00 ec 00
> mptscsih: ioc0: target reset: SUCCESS (sc=ffff8802be18dc00)
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
> mptscsih: ioc0: attempting bus reset! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00
> 01 00 00 00 00 00 00 00 ec 00
> mptscsih: ioc0: bus reset: SUCCESS (sc=ffff8802be18dc00)
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptscsih: ioc0: attempting host reset! (sc=ffff8802be18dc00)
> mptscsih: ioc0: host reset: SUCCESS (sc=ffff8802be18dc00)
>
>
> As described in
>
> https://bugzilla.kernel.org/show_bug.cgi?id=13594
>
> this can result in nasty side effects, like multiple drives getting
> kicked out of an MD array.
>
> With Ryan's patch applied on top of v2.6.34 I cannot reproduce the
> above problem with my simple script nor with Ryan's example.
>
> So, IMHO, this patch should be strongly considered for inclusion, or
> else the root cause investigated further.
>
> So, as far as I can tell:
>
> Tested-by: ClÃudio Martins <ctpm@xxxxxxxxxx>
>
> I'm also glad to test any further patches, if it turns out that the
> above is not the most correct fix for the issue.
>
> Thanks in advance.
>
> Best regards
>
> ClÃudio
>
>
> > I look forward to hearing from you guys who know this hardware and
> code
> > better than I do. Is the hardware at fault, or should the driver be
> > shielding the hardware better? Where's the right place to add this
> code, if
> > it's the right fix?
> >
> > Does this `fix' the problem for anyone besides me?
> >
> > Regards,
> > -- Ryan Kuester
> >
> >
> > Here is a minimal bit of test code which causes the error. BEWARE:
> this
> > will hose the HBA at which you point it. If that's controlling your
> > root disk, you may hang your machine.
> >
> > /*
> > * sg_bomb -- send SG_IO ioctl which causes LSI 1068 HBA to hang
> > *
> > * usage: sg_bomb <device>
> > * e.g.: sg_bomb /dev/sdb
> > * e.g.: sg_bomb /dev/sg1
> > *
> > * Modify offset_into_page to adjust the degree of buffer
> misalignment.
> > */
> >
> > #include <unistd.h>
> > #include <scsi/sg.h>
> > #include <sys/ioctl.h>
> > #include <fcntl.h>
> > #include <stdlib.h>
> >
> > int main(int argc, char* argv[])
> > {
> > char* filename = argv[1];
> > unsigned int offset_into_page = 0xe40;
> > // works: unsigned int offset_into_page = 0x0;
> > // hangs: unsigned int offset_into_page = 0xf00;
> > // works: unsigned int offset_into_page = 0xf04;
> >
> > unsigned char ata_identify_cmd[] = {0x85, 0x08, 0x0e, 0, 0, 0,
> 0x01,
> > 0, 0, 0, 0, 0, 0, 0, 0xec, 0};
> > unsigned char sense[32];
> > unsigned char* data = valloc(0x2000) + offset_into_page;
> > struct sg_io_hdr hdr = {
> > .interface_id = 'S',
> > .dxfer_direction = SG_DXFER_FROM_DEV,
> > .cmdp = ata_identify_cmd,
> > .cmd_len = 16,
> > .dxferp = data,
> > .dxfer_len = 512,
> > .sbp = sense,
> > .mx_sb_len = sizeof(sense),
> > .timeout = 5000,
> > };
> >
> > int fd;
> > if ((fd = open(filename, O_RDWR|O_NONBLOCK)) < 0)
> > perror();
> >
> > return ioctl(fd, SG_IO, &hdr);
> > }
¢éì®&Þ~º&¶¬–+-±éÝ¥Šw®žË±Êâmébžìdz¹Þ)í…æèw*jg¬±¨¶‰šŽŠÝj/êäz¹ÞŠà2ŠÞ¨è­Ú&¢)ß«a¶Úþø®G«éh®æj:+v‰¨Šwè†Ù>Wš±êÞiÛaxPjØm¶Ÿÿà -»+ƒùdš_