Re: [PATCH 0/2] Handle update hardware queues and queue freeze more carefully

From: Daniel Wagner
Date: Fri Jun 25 2021 - 08:22:02 EST

Next message: Frank Wunderlich: "Aw: Re: Re: Re: [PATCH] Fix mt7622.dtsi thermal cpu"
Previous message: Richard Weinberger: "Re: [PATCH 1/3] crypto: mxs-dcp: Add support for hardware provided keys"
In reply to: Hannes Reinecke: "Re: [PATCH 1/2] nvme-fc: Update hardware queues before using them"
Next in thread: Ming Lei: "Re: [PATCH 0/2] Handle update hardware queues and queue freeze more carefully"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jun 25, 2021 at 12:16:47PM +0200, Daniel Wagner wrote:
> this is a followup on the crash I reported in
>
> https://lore.kernel.org/linux-block/20210608183339.70609-1-dwagner@xxxxxxx/
>
> By moving the hardware check up the crash was gone. Unfortuntatly, I
> don't understand why this fixes the crash. The per-cpu access is
> crashing but I can't see why the blk_mq_update_nr_hw_queues() is
> fixing this problem.
>
> Even though I can't explain why it fixes it, I think it makes sense to
> update the hardware queue mapping bevore we recreate the IO
> queues. Thus I avoided in the commit message to say it fixes
> something.

I just discussed this with Hannes and we figured out how the crash is
fixed by moving the blk_mq_update_nr_hw_queues() before the
nvme_fc_create_hw_io_queues()/nvme_fc_connect_io_queues().

First of all, blk_mq_update_nr_hw_queues() operates on the normal
tag_set and not the admin_tag_set. That means when we move the
blk_mq_update_nr_hw_queues() before the nvme_fc_connect_io_queues(), we
update the mapping to only CPUs and hwctx which are available. When we
then do the connect call nvmf_connect_io_queue() we will only allocate
tags from queues which are not in the BLK_MQ_S_INACTIVE anymore. Hence
we skip the blk_mq_put_tag() call.

> Also during testing I observed the we hang indivinetly in
> blk_mq_freeze_queue_wait(). Again I can't explain why we get stuck
> there but given a common pattern for the nvme_wait_freeze() is to use
> it with a timeout I think the timeout should be used too :)

The nvme_wait_freeeze() is probably not needed at all,
__blk_mq_update_nr_hw_queues() already calls blk_mq_freeze_queue(). So
there this is not needed at all. Furthermore, if we move
blk_mq_update_nr_hw_queues() in front of nvme_fc_create_hw_io_queues()
there can't be any pending I/Os because there are not queues.

Next message: Frank Wunderlich: "Aw: Re: Re: Re: [PATCH] Fix mt7622.dtsi thermal cpu"
Previous message: Richard Weinberger: "Re: [PATCH 1/3] crypto: mxs-dcp: Add support for hardware provided keys"
In reply to: Hannes Reinecke: "Re: [PATCH 1/2] nvme-fc: Update hardware queues before using them"
Next in thread: Ming Lei: "Re: [PATCH 0/2] Handle update hardware queues and queue freeze more carefully"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]