Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

From: jianchao.wang
Date: Thu Feb 08 2018 - 20:43:21 EST


Hi Keith

Thanks for your precious time and kindly response.

On 02/08/2018 11:15 PM, Keith Busch wrote:
> On Thu, Feb 08, 2018 at 10:17:00PM +0800, jianchao.wang wrote:
>> There is a dangerous scenario which caused by nvme_wait_freeze in nvme_reset_work.
>> please consider it.
>>
>> nvme_reset_work
>> -> nvme_start_queues
>> -> nvme_wait_freeze
>>
>> if the controller no response, we have to rely on the timeout path.
>> there are issues below:
>> nvme_dev_disable need to be invoked.
>> nvme_dev_disable will quiesce queues, cancel and requeue and outstanding requests.
>> nvme_reset_work will hang at nvme_wait_freeze
>
> We used to not requeue timed out commands, so that wasn't a problem
> before. Oh well, I'll take a look.
>
Yes, we indeed don't requeue the timed out commands, but nvme_dev_disable will requeue the other
outstanding requests and quiesce the request queues, this will block the nvme_reset_work->nvme_wati_freeze
to move forward.

As I shared in last email, can we use(or abuse?) blk_set_preempt_only to gate the new bios on generic_make_request ?
Freezing queues is good, but wait_freeze in reset_work is a devil.

Many thanks
Jianchao