Re: [RFC PATCH] scsi: fix oops in scsi_uninit_cmd()

From: Jason Yan
Date: Thu Feb 21 2019 - 03:54:10 EST


Hi, Christoph

On 2019/2/20 23:18, Christoph Hellwig wrote:
[fullquote removed, please follow proper mail etiquette]

On Tue, Feb 19, 2019 at 08:56:28AM -0800, Bart Van Assche wrote:
regression in the SCSI sd driver due to the switch from the legacy block
layer to scsi-mq. The above patch introduces two atomic operations in the
hot path and hence would introduce a performance regression. I think this
can be avoided by making sure that sd_uninit_command() gets called before
the request tag is freed. What changes would be required to make the block
layer core call sd_uninit_command() before the request tag is freed? Would
introducing prep_rq_fn and unprep_rq_fn callbacks in struct blk_mq_ops and
making sure that the SCSI core sets these callback function pointers
appropriately be sufficient? Would such a change allow to simplify the NVMe
initiator driver? Are there any alternatives to this approach that are more
elegant?

Additional indirect calls in the I/O fast path is something I'd rather
avoid. But I don't fully understand the problem yet - where do
we release a disk reference from blk_update_request?

When userspace close the fd after blk_update_request() and before
scsi_mq_uninit_cmd(), a disk reference will be released. It is not the
blk_update_request() directly released it.

close
->sd_release
->scsi_disk_put
->scsi_disk_release
->disk->private_data = NULL;

The userspace can close the fd because blk_update_request() returned the
last IO , the userspace application does not have to stuck on read() or
write(). The window is very small, but it can be reproduce every day
in our testcases. So I'm very curious why. One possible explanation is
that we enabled kernel preempt(CONFIG_PREEMPT).

And why can't
we move that release to __blk_mq_end_request?


Thanks,

Bart.
---end quoted text---

.