Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

From: Quinn Tran
Date: Wed Jun 22 2016 - 11:57:42 EST


Johannes, Martin,

Based on the screen shot/call trace, it looks like this adapter is not using MSIX. It defaulted back to MSI or INTx interrupt. The code made an assumption of MSIX is available. There is no point in go through that code segment.

Can you try this work around? Itâs untested. Thanks.


diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 5649c20..e033ecb 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
if (!vha->flags.online)
return;

- if (rsp->msix->cpuid != smp_processor_id()) {
+ if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
/* if kernel does not notify qla of IRQ's CPU change,
* then set it here.
*/




Regards,
Quinn Tran







-----Original Message-----
From: Johannes Thumshirn <jthumshirn@xxxxxxx>
Date: Wednesday, June 22, 2016 at 4:51 AM
To: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, Josh Boyer <jwboyer@xxxxxxxxxxxxxxxxx>, Thorsten Leemhuis <regressions@xxxxxxxxxxxxx>, linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>, Quinn Tran <quinn.tran@xxxxxxxxxx>
Subject: Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

>On Tue, Jun 21, 2016 at 09:25:18PM -0400, Martin K. Petersen wrote:
>> >>>>> "Linus" == Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>>
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=1348342
>>
>> This first one appears to be a crash in a USB sound doodad and not
>> qla2xxx. Also, this appears to be where the 4.5.5 -> 4.5.6 notion comes
>> from. So we can probably ignore 4.5.5 as the last good revision.
>>
>> Linus> as far as I can tell. And neither of them looks very likely, but
>> Linus> what do I know. Adding Martin Petersen and Johannes Thumshirn to
>> Linus> the participants just in case they go "Ahh.."
>>
>> Doubt it's Johannes' tweak. The qla2xxx crash from the two other
>> bugzilla entries is in:
>>
>> (gdb) list *qla24xx_process_response_queue+0x49
>> 0x27e09 is in qla24xx_process_response_queue (drivers/scsi/qla2xxx/qla_isr.c:2560).
>> 2555 if (rsp->msix->cpuid != smp_processor_id()) {
>> 2556 /* if kernel does not notify qla of IRQ's CPU change,
>> 2557 * then set it here.
>> 2558 */
>> 2559 rsp->msix->cpuid = smp_processor_id();
>> 2560 ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
>> 2561 }
>> 2562
>> 2563 while (rsp->ring_ptr->signature != RESPONSE_PROCESSED) {
>> 2564 pkt = (struct sts_entry_24xx *)rsp->ring_ptr;
>>
>> That particular code went into 4.5 and comes from:
>>
>> commit cdb898c52d1dfad4b4800b83a58b3fe5d352edde
>> Author: Quinn Tran <quinn.tran@xxxxxxxxxx>
>> Date: Thu Dec 17 14:57:05 2015 -0500
>>
>> qla2xxx: Add irq affinity notification
>>
>> Register to receive notification of when irq setting change
>> occured.
>>
>> Signed-off-by: Quinn Tran <quinn.tran@xxxxxxxxxx>
>> Signed-off-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>
>> Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
>>
>> Quinn?
>>
>> --
>> Martin K. Petersen Oracle Linux Engineering
>
>Having a quick look at it I _think_ this could be the problem.
>We request the IRQ _before_ actually assigning the rsp->msix entry. Now If an
>IRQ triggers, before the assignment we touch rsp->msix->cpuid, which is
>probably the case. At least from what I conduct from Martin's mail.
>
>diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
>index 5649c20..20743a3 100644
>--- a/drivers/scsi/qla2xxx/qla_isr.c
>+++ b/drivers/scsi/qla2xxx/qla_isr.c
>@@ -3086,6 +3086,8 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
> /* Enable MSI-X vectors for the base queue */
> for (i = 0; i < 2; i++) {
> qentry = &ha->msix_entries[i];
>+ qentry->rsp = rsp;
>+ rsp->msix = qentry;
> if (IS_P3P_TYPE(ha))
> ret = request_irq(qentry->vector,
> qla82xx_msix_entries[i].handler,
>@@ -3097,8 +3099,6 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
> if (ret)
> goto msix_register_fail;
> qentry->have_irq = 1;
>- qentry->rsp = rsp;
>- rsp->msix = qentry;
>
> /* Register for CPU affinity notification. */
> irq_set_affinity_notifier(qentry->vector, &qentry->irq_notify);
>@@ -3119,12 +3119,12 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
> */
> if (QLA_TGT_MODE_ENABLED() && IS_ATIO_MSIX_CAPABLE(ha)) {
> qentry = &ha->msix_entries[ATIO_VECTOR];
>+ qentry->rsp = rsp;
>+ rsp->msix = qentry;
> ret = request_irq(qentry->vector,
> qla83xx_msix_entries[ATIO_VECTOR].handler,
> 0, qla83xx_msix_entries[ATIO_VECTOR].name, rsp);
> qentry->have_irq = 1;
>- qentry->rsp = rsp;
>- rsp->msix = qentry;
> }
>
> msix_register_fail:
>
>
>I'm not sure if we need the qentry->have_irq assingment as well, I'm not
>deep enough into the qla2xx driver yet, maybe Quinn can clarify.
>Beware of the above change being untested.
>
>Byte,
> Johannes
>--
>Johannes Thumshirn Storage
>jthumshirn@xxxxxxx +49 911 74053 689
>SUSE LINUX GmbH, Maxfeldstr. 5, 90409 NÃrnberg
>GF: Felix ImendÃrffer, Jane Smithard, Graham Norton
>HRB 21284 (AG NÃrnberg)
>Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850