Re: [PATCH v7 03/17] scsi: ufs: implement scsi host timeout handler

From: Hannes Reinecke
Date: Wed Mar 09 2016 - 01:45:05 EST


On 03/08/2016 08:58 PM, ygardi@xxxxxxxxxxxxxx wrote:
>> On 03/08/2016 02:01 PM, Hannes Reinecke wrote:
>>> On 03/08/2016 01:35 PM, Yaniv Gardi wrote:
>>>> A race condition exists between request requeueing and scsi layer
>>>> error handling:
>>>> When UFS driver queuecommand returns a busy status for a request,
>>>> it will be requeued and its tag will be freed and set to -1.
>>>> At the same time it is possible that the request will timeout and
>>>> scsi layer will start error handling for it. The scsi layer reuses
>>>> the request and its tag to send error related commands to the device,
>>>> however its tag is no longer valid.
>>>> As this request was never really sent to the device, there is no
>>>> point to start error handling with the device.
>>>> Implement the scsi error handling timeout callback and bypass SCSI
>>>> error handling for request that were not actually sent to the device.
>>>> For such requests simply reset the block layer timer. Otherwise, let
>>>> SCSI layer perform the usual error handling.
>>>>
>>>> Reviewed-by: Dolev Raviv <draviv@xxxxxxxxxxxxxx>
>>>> Signed-off-by: Gilad Broner <gbroner@xxxxxxxxxxxxxx>
>>>> Signed-off-by: Yaniv Gardi <ygardi@xxxxxxxxxxxxxx>
>>>>
>>>> ---
>>>> drivers/scsi/ufs/ufshcd.c | 36 ++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 36 insertions(+)
>>>>
>>> Having a timeout handler is always a good idea, even though this
>>> doesn't do anything here.
>>> Are we sure that the requests will return eventually?
>>> Does the UFS spec provide for a command abort?
>>>
>> In fact, looking at the UFS spec there _is_ a command abort.
>> I would recommend implementing a task management request UPIO with
>> type 'ABORT TASK' here for any task found to be pending.
>> In the end, you might run into a _valid_ timeout, at which point you
>> really want to abort the command...
>>
>
> but this is not what we'd like to achieve.
> we don't want to abort a task that was not even dispatched to the UFS driver.
> in those cases we need to re-queue the request and reset the timer.
>
Fully understood.

> Hannes, i appreciate your time, but I really don't understand why you
> insist on coming up with suggestions, when we already implemented one that
> is working. more over, your solution doesn't fix the race condition which is the
> reason for this patch.
> as i don't have HW to test anything at the moment, I think it's better to
> stick with this solution that also fix the BUG and also was verified and
> tested.
>
Ah. Didn't know that. I was under the impression that you _had_ the
hardware available. If not then of course it's not easy to verify
anything.

So, all things considered:

Reviewed-by: Hannes Reinecke <hare@xxxxxxx>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@xxxxxxx +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)