Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices

From: Jens Axboe
Date: Mon Sep 19 2022 - 10:10:42 EST


On 9/18/22 10:10 AM, Liu Song wrote:
>
> On 2022/9/18 00:50, Jens Axboe wrote:
>> On 9/17/22 10:40 AM, Liu Song wrote:
>>> From: Liu Song <liusong@xxxxxxxxxxxxxxxxx>
>>>
>>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>>> request, so the corresponding process can be simplified.
>> If the worry is the call overhead of blk_mq_complete_request_remote(),
>> why don't we just make that available as an inline instead? That seems
>> vastly superior to providing a random shortcut in a driver to avoid
>> calling it.
>
> Hi
>
> This is what I think about it. If it is an SSD with only one hw queue,
> remote requests will appear occasionally. As a real multi-queue
> device, nvme usually does not have remote requests, so we don't need
> to care about it. So even if "blk_mq_complete_request_remote" is
> called, there is a high probability that it will return false, and the
> cost of changing the call to "blk_mq_complete_request_remote" to
> determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function
> call, but more judgments in "blk_mq_complete_request_remote". If
> "blk_mq_complete_request_remote" is decorated as inline, it also
> depends on the compiler, there is uncertainty.

I'm not disagreeing with any of that, my point is just that you're
hacking around this in the nvme driver. This is problematic whenever
core changes happen, because now we have to touch individual drivers.
While the expectation is that there are no remote IPI completions for
NVMe, queue starved devices do exist and those do see remote
completions.

This optimization belongs in the blk-mq core, not in nvme. I do think it
makes sense, you just need to solve it in blk-mq rather than in the nvme
driver.

--
Jens Axboe