Re: [PATCH] nvme: fix doulbe irq free
From: Tong Zhang
Date: Thu Sep 17 2020 - 11:41:31 EST
On Thu, Sep 17, 2020 at 4:30 AM Christoph Hellwig <hch@xxxxxx> wrote:
>
> On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote:
> > the irq might already been released before reset work can run
>
> If it is we have a problem with the state machine, as we shouldn't
> have two resets running at the same time. Can you share your reproducer?
>
Hi Christoph,
Thanks for replying.
It is possible that pci_free_irq() can be called twice for the same IRQ,
the first call can be
[ 124.413113] nvme nvme0: I/O 6 QID 0 timeout, disable controller
[ 124.414317] Workqueue: kblockd blk_mq_timeout_work
[ 124.414317] Call Trace:
[ 124.414317] dump_stack+0x7d/0xb0
[ 124.414317] nvme_suspend_queue.cold+0x11/0x58
[ 124.414317] nvme_dev_disable+0x116/0x7b0
[ 124.414317] nvme_timeout+0x309/0x320
[ 124.414317] ? nvme_shutdown+0x40/0x40
[ 124.414317] ? load_balance+0xe76/0x1450
[ 124.414317] ? deref_stack_reg+0x40/0x40
[ 124.414317] ? sched_clock_local+0x99/0xc0
[ 124.414317] blk_mq_check_expired+0x313/0x340
[ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180
[ 124.414317] ? _find_next_bit.constprop.0+0x3e/0xf0
[ 124.414317] blk_mq_queue_tag_busy_iter+0x2e9/0x4a0
[ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180
[ 124.414317] ? blk_mq_all_tag_iter+0x10/0x10
[ 124.414317] ? _raw_write_lock_irqsave+0xd0/0xd0
[ 124.414317] ? run_rebalance_domains+0x80/0x80
[ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180
[ 124.414317] ? debug_object_deactivate.part.0+0x1c4/0x210
[ 124.414317] blk_mq_timeout_work+0xaa/0x1f0
[ 124.414317] ? __blk_mq_end_request+0x1f0/0x1f0
[ 124.414317] ? __schedule+0x581/0xc40
[ 124.414317] ? read_word_at_a_time+0xe/0x20
[ 124.414317] ? strscpy+0xbf/0x190
[ 124.414317] process_one_work+0x4ad/0x7e0
[ 124.414317] worker_thread+0x73/0x690
[ 124.414317] ? process_one_work+0x7e0/0x7e0
[ 124.414317] kthread+0x199/0x1f0
[ 124.414317] ? kthread_create_on_node+0xd0/0xd0
[ 124.414317] ret_from_fork+0x22/0x30
And the second call can be
[ 149.763974] nvme nvme0: Failed to read smart log (error -4)
[ 149.774307] Trying to free already-free IRQ 10
[ 149.774385] Call Trace:
[ 149.774385] pci_free_irq+0x13/0x20
[ 149.774385] nvme_reset_work.cold+0x1be/0xa78
[ 149.774385] ? nvme_probe+0x8c0/0x8c0
[ 149.774385] ? check_preempt_curr+0x9d/0xf0
[ 149.774385] ? ttwu_do_wakeup.isra.0+0x176/0x190
[ 149.774385] ? try_to_wake_up+0x37c/0x900
[ 149.774385] ? migration_cpu_stop+0x1e0/0x1e0
[ 149.774385] ? __schedule+0x581/0xc40
[ 149.774385] ? read_word_at_a_time+0xe/0x20
[ 149.774385] ? strscpy+0xbf/0x190
[ 149.774385] process_one_work+0x4ad/0x7e0
[ 149.774385] worker_thread+0x73/0x690
[ 149.774385] ? process_one_work+0x7e0/0x7e0
[ 149.774385] kthread+0x199/0x1f0
[ 149.774385] ? kthread_create_on_node+0xd0/0xd0
[ 149.774385] ret_from_fork+0x22/0x30
and clearly something is not working as expected.
- Tong