Re: [PATCH] nvme-rdma: complete requests from ->timeout

From: Jaesoo Lee
Date: Sat Dec 08 2018 - 01:29:00 EST


Now, I see that my patch is not safe and can cause double completions.
However, I am having a hard time finding out a good solution to
barrier the racing completions.

Could you suggest where the fix should go and what should it look
like? We can provide more details on reproducing this issue if that
helps.

On Fri, Dec 7, 2018 at 6:04 PM Keith Busch <keith.busch@xxxxxxxxx> wrote:
>
> On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote:
> >
> > > Could you please take a look at this bug and code review?
> > >
> > > We are seeing more instances of this bug and found that reconnect_work
> > > could hang as well, as can be seen from below stacktrace.
> > >
> > > Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> > > Call Trace:
> > > __schedule+0x2ab/0x880
> > > schedule+0x36/0x80
> > > schedule_timeout+0x161/0x300
> > > ? __next_timer_interrupt+0xe0/0xe0
> > > io_schedule_timeout+0x1e/0x50
> > > wait_for_completion_io_timeout+0x130/0x1a0
> > > ? wake_up_q+0x80/0x80
> > > blk_execute_rq+0x6e/0xa0
> > > __nvme_submit_sync_cmd+0x6e/0xe0
> > > nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics]
> > > ? wait_for_completion_interruptible_timeout+0x157/0x1b0
> > > nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma]
> > > nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma]
> > > nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma]
> > > process_one_work+0x179/0x390
> > > worker_thread+0x4f/0x3e0
> > > kthread+0x105/0x140
> > > ? max_active_store+0x80/0x80
> > > ? kthread_bind+0x20/0x20
> > >
> > > This bug is produced by setting MTU of RoCE interface to '568' for
> > > test while running I/O traffics.
> >
> > I think that with the latest changes from Keith we can no longer rely
> > on blk-mq to barrier racing completions. We will probably need
> > to barrier ourselves in nvme-rdma...
>
> You really need to do that anyway. If you were relying on blk-mq to save
> you from double completions by ending a request in the nvme driver while
> the lower half can still complete the same one, the only thing preventing
> data corruption is the probability the request wasn't reallocated for a
> new command.