Re: [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity()

From: Thomas Gleixner
Date: Tue Mar 15 2022 - 10:25:58 EST


On Fri, Feb 18 2022 at 08:41, John Garry wrote:
> On 17/02/2022 17:17, Marc Zyngier wrote:
>>> I know you mentioned it in 2/2, but it would be interesting to see how
>>> network controller drivers can handle the problem of missing in-flight
>>> IO completions for managed irq shutdown. For storage controllers this
>>> is all now safely handled in the block layer.
>>
>> Do you have a pointer to this? It'd be interesting to see if there is
>> a common pattern.
>
> Check blk_mq_hctx_notify_offline() and other hotplug handler friends in
> block/blk-mq.c and also blk_mq_get_ctx()/blk_mq_map_queue()
>
> So the key steps in CPU offlining are:
> - when the last CPU in HW queue context cpumask is going offline we mark
> the HW queue as inactive and no longer queue requests there
> - drain all in-flight requests before we allow that last CPU to go
> offline, meaning that we always have a CPU online to service any
> completion interrupts
>
> This scheme relies on symmetrical HW submission and completion queues
> and also that the blk-mq HW queue context cpumask is same as the HW
> queue's IRQ affinity mask (see blk_mq_pci_map_queues()).
>
> I am not sure how much this would fit with the networking stack or that
> marvell driver.

The problem with networking is RX flow steering.

The driver in question initializes the RX flows in
mvpp22_port_rss_init() by default so the packets are evenly distributed
accross the RX queues.

So without actually steering the RX flow away from the RX queue which is
associated to the to be unplugged CPU, this does not really work well.

Thanks,

tglx