Re: Question on handling managed IRQs when hotplugging CPUs

From: Keith Busch
Date: Tue Jan 29 2019 - 10:45:25 EST


On Tue, Jan 29, 2019 at 03:25:48AM -0800, John Garry wrote:
> Hi,
>
> I have a question on $subject which I hope you can shed some light on.
>
> According to commit c5cb83bb337c25 ("genirq/cpuhotplug: Handle managed
> IRQs on CPU hotplug"), if we offline the last CPU in a managed IRQ
> affinity mask, the IRQ is shutdown.
>
> The reasoning is that this IRQ is thought to be associated with a
> specific queue on a MQ device, and the CPUs in the IRQ affinity mask are
> the same CPUs associated with the queue. So, if no CPU is using the
> queue, then no need for the IRQ.
>
> However how does this handle scenario of last CPU in IRQ affinity mask
> being offlined while IO associated with queue is still in flight?
>
> Or if we make the decision to use queue associated with the current CPU,
> and then that CPU (being the last CPU online in the queue's IRQ
> afffinity mask) goes offline and we finish the delivery with another CPU?
>
> In these cases, when the IO completes, it would not be serviced and timeout.
>
> I have actually tried this on my arm64 system and I see IO timeouts.

Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
which would reap all outstanding commands before the CPU and IRQ are
taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
Create hctx for each present CPU"). It sounds like we should bring
something like that back, but make more fine grain to the per-cpu context.