Hi Can,
On Sat, 2020-08-01 at 07:17 +0800, Can Guo wrote:
Hi Bart,
On 2020-08-01 00:51, Bart Van Assche wrote:
> On 2020-07-31 01:00, Can Guo wrote:
>> AFAIK, sychronization of scsi_done is not a problem here, because scsi
>> layer
>> use the atomic state, namely SCMD_STATE_COMPLETE, of a scsi cmd to
>> prevent
>> the concurrency of abort and real completion of it.
>>
>> Check func scsi_times_out(), hope it helps.
>>
>> enum blk_eh_timer_return scsi_times_out(struct request *req)
>> {
>> ...
>> if (rtn == BLK_EH_DONE) {
>> /*
>> * Set the command to complete first in order to
>> prevent
>> a real
>> * completion from releasing the command while error
>> handling
>> * is using it. If the command was already completed,
>> then the
>> * lower level driver beat the timeout handler, and it
>> is safe
>> * to return without escalating error recovery.
>> *
>> * If timeout handling lost the race to a real
>> completion, the
>> * block layer may ignore that due to a fake timeout
>> injection,
>> * so return RESET_TIMER to allow error handling
>> another
>> shot
>> * at this command.
>> */
>> if (test_and_set_bit(SCMD_STATE_COMPLETE,
>> &scmd->state))
>> return BLK_EH_RESET_TIMER;
>> if (scsi_abort_command(scmd) != SUCCESS) {
>> set_host_byte(scmd, DID_TIME_OUT);
>> scsi_eh_scmd_add(scmd);
>> }
>> }
>> }
>
> I am familiar with this mechanism. My concern is that both the regular
> completion path and the abort handler must call scsi_dma_unmap() before
> calling cmd->scsi_done(cmd). I don't see how
> test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state) could prevent that
> the regular completion path and the abort handler call scsi_dma_unmap()
> concurrently since both calls happen before the SCMD_STATE_COMPLETE bit
> is set?
>
> Thanks,
>
> Bart.
For scsi_dma_unmap() part, that is true - we should make it serialized
with
any other completion paths. I've found it during my fault injection
test, so
I've made a patch to fix it, but it only comes in my next error recovery
enhancement patch series. Please check the attachment.
Your patch looks good to me.
I have the same idea before but I found that calling scsi_done() (by
__ufshcd_transfer_req_compl()) in ufshcd_abort() in old kernel (e.g.,
4.14) will cause issues but it has been resolved by introduced
SCMD_STATE_COMPLETE flag in newer kernel. So your patch makes sense.
Would you mind sending out this draft patch as a formal patch together
with my patch to fix issues in ufshcd_abort()? Our patches are aimed to
fix cases that host/device reset eventually not being triggered by the
result of ufshcd_abort(), for example, command is aborted successfully
or command is not pending in device with its doorbell also cleared.
Thanks,
Stanley Chu
Thanks,
Can Guo.