Re: [PATCH v9 09/13] isolation: Introduce io_queue isolcpus type

From: Sebastian Andrzej Siewior

Date: Thu Apr 02 2026 - 05:21:12 EST


On 2026-04-01 16:58:22 [-0400], Aaron Tomlin wrote:
> Hi Sebastian,
Hi,

> Thank you for taking the time to document the "managed_irq" behaviour; it
> is immensely helpful. You raise a highly pertinent point regarding the
> potential proliferation of "isolcpus=" flags. It is certainly a situation
> that must be managed carefully to prevent every subsystem from demanding
> its own bit.
>
> To clarify the reasoning behind introducing "io_queue" rather than strictly
> relying on managed_irq:
>
> The managed_irq flag belongs firmly to the interrupt subsystem. It dictates
> whether a CPU is eligible to receive hardware interrupts whose affinity is
> managed by the kernel. Whilst many modern block drivers use managed IRQs,
> the block layer multi-queue mapping encompasses far more than just
> interrupt routing. It maps logical queues to CPUs to handle I/O submission,
> software queues, and crucially, poll queues, which do not utilise
> interrupts at all. Furthermore, there are specific drivers that do not use
> the managed IRQ infrastructure but still rely on the block layer for queue
> distribution.

Could you tell block which queue maps to which CPU at /sys/block/$$/mq/
level? Then you have one queue going to one CPU.
Then the drive could request one or more interrupts managed or not. For
managed you could specify a CPU mask which you desire to occupy.
You have the case where
- you have more queues than CPUs
- use all of them
- use less
- less queues than CPUs
- mapped a queue to more than once CPU in case it goes down or becomes
not available
- mapped to one CPU

Ideally you solve this at one level so that the device(s) can request
less queues than CPUs if told so without patching each and every driver.

This should give you the freedom to isolate CPUs, decide at boot time
which CPUs get I/O queues assigned. At run time you can tell which
queues go to which CPUs. If you shutdown a queue, the interrupt remains
but does not get any I/O requests assigned so no problem. If the CPU
goes down, same thing.

I am trying to come up with a design here which I haven't found so far.
But I might be late to the party and everyone else is fully aware.

> If managed_irq were solely relied upon, the IRQ subsystem would
> successfully keep hardware interrupts off the isolated CPUs, but the block

The managed_irqs can't be influence by userland. The CPUs are auto
distributed.

> layer would still blindly map polling queues or non-managed queues to those
> same isolated CPUs. This would force isolated CPUs to process I/O
> submissions or handle polling tasks, thereby breaking the strict isolation.
>
> Regarding the point about the networking subsystem, it is a very valid
> comparison. If the networking layer wishes to respect isolcpus in the
> future, adding a net flag would indeed exacerbate the bit proliferation.

Networking could also have different cases like adding a RX filter and
having HW putting packet based on it in a dedicated queue. But also in
this case I would like to have the freedome to decide which isolated
CPUs should receive interrupts/ traffic and which don't.

> For the present time, retaining io_queue seems the most prudent approach to
> ensure that block queue mapping remains semantically distinct from
> interrupt delivery. This provides an immediate and clean architectural
> boundary. However, if the consensus amongst the maintainers suggests that
> this is too granular, alternative approaches could certainly be considered
> for the future. For instance, a broader, more generic flag could be
> introduced to encompass both block and future networking queue mappings.
> Alternatively, if semantic conflation is deemed acceptable, the existing
> managed_irq housekeeping mask could simply be overloaded within the block
> layer to restrict all queue mappings.
>
> Keeping the current separation appears to be the cleanest solution for this
> series, but your thoughts, and those of the wider community, on potentially
> migrating to a consolidated generic flag in the future would be very much
> welcomed.

I just don't like introducing yet another boot argument, making it a
boot constraint while in my naive view this could be managed at some
degree via sysfs as suggested above.

>
> Kind regards,

Sebastian