Re: [PATCH 0/3] blk-mq & nvme: introduce .map_changed

From: Ming Lei
Date: Tue Sep 29 2015 - 18:16:32 EST

Next message: Yury Norov: "[PATCH v5 06/23] arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64"
Previous message: Yury Norov: "[PATCH v5 05/23] arm64:ilp32: expose 'kernel_long' as 'long long' for ILP32"
In reply to: Jens Axboe: "Re: [PATCH 0/3] blk-mq & nvme: introduce .map_changed"
Next in thread: Keith Busch: "Re: [PATCH 0/3] blk-mq & nvme: introduce .map_changed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Sep 29, 2015 at 10:47 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 09/29/2015 08:26 AM, Keith Busch wrote:
>>
>> On Mon, 28 Sep 2015, Ming Lei wrote:
>>>
>>> This patchset introduces .map_changed callback into 'struct blk_mq_ops',
>>> and use this callback to get NVMe notified about the mapping changed
>>> event,
>>> then NVMe can update the irq affinity hint for its queues.
>>
>>
>> I think this is going the wrong direction. Shouldn't we provide blk-mq
>> the vectors in the tag set so that layer can manage the irq hints?
>>
>> This could lead to more cpu-queue assignment optimizations from using
>> that information. For example, two h/w contexts sharing the same vector
>> shouldn't be assigned to cpus on different NUMA nodes.
>
>
> I agree, this is moving in the wrong direction. Currently the sw <->hw queue
> mappings are in blk-mq, and this is the exact same information base we need
> for IRQ affinity handling. We need to move in the direction of having blk-mq
> helpers handle that part too, not pass notifications to the lower level
> driver to update its IRQ mappings.

Yes, I thought of that before, but it has the following cons:

- some drivers/devices may need different IRQ affinity policy, such as virtio
devices which has its own set affinity handler(see virtqueue_set_affinity()),
and it is offten not efficient to handle the virt queue's irq on more
than one CPU.

- block core has to get the irq vector information which has to be
setup/finalized
before blk-mq uses that for setting irq affinity, for example, in case
NVMe's admin
queue, its vector can be changed after admin queue's initialization.

That is why I said this approach is more flexible.

>
>>> Also the 'cpumask' in 'struct blk_mq_tags' isn't needed any more, so
>>> remove
>>> that and related kernel interface.
>>
>>
>> It was added to the tags because the cpu mask is an artifact of the
>> tags rather that duplicating it across all the h/w contexts sharing the
>> same set. It also doesn't let a h/w context from one namespace overwrite
>> another's cpu affinity mask when they share the same vector.
>
>
> So having the mask in the tags is really odd, it should be in some
> per-device type data instead.

Agree, removing the mask in tags is one of this patchset's motivation.

--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Yury Norov: "[PATCH v5 06/23] arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64"
Previous message: Yury Norov: "[PATCH v5 05/23] arm64:ilp32: expose 'kernel_long' as 'long long' for ILP32"
In reply to: Jens Axboe: "Re: [PATCH 0/3] blk-mq & nvme: introduce .map_changed"
Next in thread: Keith Busch: "Re: [PATCH 0/3] blk-mq & nvme: introduce .map_changed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]