On Wed, 28 May 2014, Matias BjÃrling wrote:
This converts the current NVMe driver to utilize the blk-mq layer.
I am concerned about device hot removal since the h/w queues can be
freed at any time. I *think* blk-mq helps with this in that the driver
will not see a new request after calling blk_cleanup_queue. If you can
confirm that's true and that blk-mq waits for all requests active in the
driver to return to the block layer, then we're probably okay in this
path. That wasn't true as a bio based driver which is why we are cautious
with all the locking and barriers. But what about the IOCTL paths?
It also doesn't look like we're handling the case where the SGL can't
map to a PRP.
+static void req_completion(struct nvme_queue *nvmeq, void *ctx,
struct nvme_completion *cqe)
{
struct nvme_iod *iod = ctx;
- struct bio *bio = iod->private;
+ struct request *req = iod->private;
+
u16 status = le16_to_cpup(&cqe->status) >> 1;
- if (unlikely(status)) {
- if (!(status & NVME_SC_DNR ||
- bio->bi_rw & REQ_FAILFAST_MASK) &&
- (jiffies - iod->start_time) < IOD_TIMEOUT) {
- if (!waitqueue_active(&nvmeq->sq_full))
- add_wait_queue(&nvmeq->sq_full,
- &nvmeq->sq_cong_wait);
- list_add_tail(&iod->node, &nvmeq->iod_bio);
- wake_up(&nvmeq->sq_full);
- return;
- }
- }
if (iod->nents) {
- dma_unmap_sg(nvmeq->q_dmadev, iod->sg, iod->nents,
- bio_data_dir(bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
- nvme_end_io_acct(bio, iod->start_time);
+ dma_unmap_sg(&nvmeq->dev->pci_dev->dev, iod->sg, iod->nents,
+ rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
}
nvme_free_iod(nvmeq->dev, iod);
- if (status)
- bio_endio(bio, -EIO);
+
+ if (unlikely(status))
+ req->errors = -EIO;
else
- bio_endio(bio, 0);
+ req->errors = 0;
+
+ blk_mq_complete_request(req);
}
Is blk-mq going to retry intermittently failed commands for me? It
doesn't look like it will.
+static int nvme_submit_flush_sync(struct nvme_queue *nvmeq, struct
nvme_ns *ns)
+{
+ struct request *req;
+ struct nvme_command cmnd;
+
+ req = blk_mq_alloc_request(ns->queue, WRITE, GFP_KERNEL, false);
+ if (!req)
+ return -ENOMEM;
+
+ nvme_setup_flush(&cmnd, ns, req->tag);
+ nvme_submit_sync_cmd(req, &cmnd, NULL, NVME_IO_TIMEOUT);
return 0;
}
It looks like this function above is being called from an interrupt
context where we are already holding a spinlock. The sync command will
try to take that same lock.
+ if ((req->cmd_flags & REQ_FLUSH) && psegs) {
+ struct flush_cmd_info *flush_cmd = kmalloc(
+ sizeof(struct flush_cmd_info), GFP_KERNEL);
The comment above says "may not sleep", but using GFP_KERNEL here. I
actually think it is safe to sleep, though since you're not taking a
lock until later, so maybe you can change all the allocs to GFP_KERNEL?