Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

From: Quinn Tran
Date: Thu Jun 23 2016 - 12:15:25 EST



-----Original Message-----
From: Johannes Thumshirn <jthumshirn@xxxxxxx>
Date: Thursday, June 23, 2016 at 12:22 AM
To: Quinn Tran <quinn.tran@xxxxxxxxxx>
Cc: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, Josh Boyer <jwboyer@xxxxxxxxxxxxxxxxx>, Thorsten Leemhuis <regressions@xxxxxxxxxxxxx>, linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>, linux-scsi <linux-scsi@xxxxxxxxxxxxxxx>
Subject: Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

>[+ Cc linux-scsi@xxxxxxxxxxxxxxx ]
>
>On Wed, Jun 22, 2016 at 03:57:35PM +0000, Quinn Tran wrote:
>> Johannes, Martin,
>>
>> Based on the screen shot/call trace, it looks like this adapter is not using MSIX. It defaulted back to MSI or INTx interrupt. The code made an assumption of MSIX is available. There is no point in go through that code segment.
>>
>> Can you try this work around? Itâs untested. Thanks.
>>
>>
>> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
>> index 5649c20..e033ecb 100644
>> --- a/drivers/scsi/qla2xxx/qla_isr.c
>> +++ b/drivers/scsi/qla2xxx/qla_isr.c
>> @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
>> if (!vha->flags.online)
>> return;
>>
>> - if (rsp->msix->cpuid != smp_processor_id()) {
>> + if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>> /* if kernel does not notify qla of IRQ's CPU change,
>> * then set it here.
>> */
>>
>
>But this still does not fix the race which would be possible if the HBA is
>using MSI-X but triggering IRQs early enough.
>
>Have a look at this (I admit theoretical) path:
>qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
>{
> [...]
> /* Enable MSI-X vectors for the base queue */
> for (i = 0; i < 2; i++) {
> qentry = &ha->msix_entries[i];
> if (IS_P3P_TYPE(ha))
> ret = request_irq(qentry->vector,
> qla82xx_msix_entries[i].handler,
> 0, qla82xx_msix_entries[i].name, rsp);
> else
> ret = request_irq(qentry->vector,
> msix_entries[i].handler,
> 0, msix_entries[i].name, rsp);
> if (ret)
> goto msix_register_fail;
> <--- IRQ arrives here

QT: setting up the interrupt vector does not mean the interrupt starts firing immediately. Interrupt starting firing when the driver is ready to accept the interrupt by enabling the interrupt (ha->isp_ops->enable_intrs(ha)) later on in time. In addition, that particular code path/qla24xx_process_response_queue is not executed until driver feeds commands to the hardware work queue.

IF there is a left over interrupt that happens to trigger the call immediately, there is another check that prevent the code from getting to the point of the âtheoretical" race.


> qentry->have_irq = 1;
> qentry->rsp = rsp;
> rsp->msix = qentry;
>
> [...]
>
>
>void qla24xx_process_response_queue(struct scsi_qla_host *vha,
> struct rsp_que *rsp)
>{
--->8------
if (!vha->flags.online)
return;

---8<------
> if (rsp->msix->cpuid != smp_processor_id()) {
> ^
> \--- rsp->msix == NULL
>
> /* if kernel does not notify qla of IRQ's CPU change,
> * then set it here.
> */
> rsp->msix->cpuid = smp_processor_id();
> ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
>
>--
>Johannes Thumshirn Storage
>jthumshirn@xxxxxxx +49 911 74053 689
>SUSE LINUX GmbH, Maxfeldstr. 5, 90409 NÃrnberg
>GF: Felix ImendÃrffer, Jane Smithard, Graham Norton
>HRB 21284 (AG NÃrnberg)
>Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850