Re: [PATCH v10 13/13] docs: add io_queue flag to isolcpus

From: Ming Lei

Date: Thu Apr 09 2026 - 11:03:03 EST

On Wed, Apr 8, 2026 at 11:58 PM Aaron Tomlin <atomlin@xxxxxxxxxxx> wrote:
>
> On Mon, Apr 06, 2026 at 11:29:38AM +0800, Ming Lei wrote:
> > I don't think there is such breaking isolation thing. For iopoll, if
> > applications won't submit polled IO on isolated CPUs, everything is just
> > fine. If they do it, IO may be reaped from isolated CPUs, that is just their
> > choice, anything is wrong?
>
> Hi Ming,
>
> Thank you for your follow up. You make a fair point regarding polling
> queues and application choice; if an application explicitly binds to an
> isolated CPU and submits polled operations, it is indeed actively electing
> to utilise that core and accept the resulting behaviour.
>
> However, the architectural challenge arises from how the kernel handles
> these queues structurally when the application does not explicitly make
> that choice. Because poll queues never utilise interrupts, they are
> completely invisible to the managed interrupt subsystem.
>
> If we were to rely exclusively on the managed irq flag, the block layer
> would blindly map these non interrupt driven polling queues to isolated
> CPUs. If a general background storage operation were then routed to
> that queue, the isolated core would be forced to spin actively in a tight

How can the isolated core be scheduled for running polling task?

Who triggered it?

> loop waiting for the hardware completion. This would completely monopolise
> the core and destroy any real time isolation guarantees without the user
> space application ever having requested it.

No.

IOPOLL queue doesn't have interrupt, and the ->poll() is only run from
the submission context. So if you don't submitted polled IO on isolated
CPU cores, everything is just fine. This is simpler than irq IO actually.

>
> This illustrates precisely why the io queue flag is a mechanical necessity.
> Its primary objective is to act as a comprehensive block layer isolation
> boundary. It structurally restricts both hardware queue placement and
> managed interrupt affinity strictly to housekeeping CPUs, ensuring that no
> storage queue operations of any kind are mapped to an isolated CPU.
>
> To achieve this reliably, this series expands the struct irq affinity
> structure to incorporate a new CPU mask [1]. This mask is explicitly set to
> the result of blk mq online queue affinity. By passing this housekeeping
> mask directly through the interrupt affinity parameters, we ensure that the
> native affinity calculation is strictly bounded to non isolated CPUs from
> the moment the device probes.
>
> This structural enhancement allows device drivers to seamlessly inherit the
> isolation constraints without requiring bespoke, driver specific logic. A
> clear example of this application can be seen in the modifications to the
> Broadcom MPI3 Storage Controller [2]. By leveraging the expanded struct irq
> affinity, the driver guarantees that its queues and corresponding managed
> interrupts are perfectly aligned with the system housekeeping
> configuration, completely avoiding the isolated CPUs during allocation.
>
> [1]: https://lore.kernel.org/lkml/20260401222312.772334-5-atomlin@xxxxxxxxxxx/
> [2]: https://lore.kernel.org/lkml/20260401222312.772334-8-atomlin@xxxxxxxxxxx/
>
> I hope this better illustrates the mechanical necessity of the io_queue
> flag and the corresponding changes to the interrupt affinity structures.

Can you share one example in which managed irq can't address?

>
> > > Every logical CPU, including the isolated ones, must logically map to a
> > > hardware context in order to submit input and output requests, saying they
> > > are completely restricted is indeed stale and technically inaccurate. The
> > > isolation mechanism actually ensures that the hardware contexts themselves
> > > are serviced by the housekeeping CPUs, while the isolated CPUs are simply
> > > mapped onto these housekeeping queues for submission purposes. I will
> > > rewrite this paragraph to accurately reflect this topology, ensuring it
> > > aligns perfectly with the behaviour introduced in patch 10.
> >
> > I am not sure if the above words is helpful from administrator viewpoint about
> > the two kernel parameters.
> >
> > IMO, only two differences from this viewpoint:
> >
> > 1) `io_queue` may reduce nr_hw_queues
> >
> > 2) when application submits IO from isolated CPUs, `io_queue` can complete
> > IO from housekeeping CPUs.
>
> Acknowledged.

Are there other major differences besides the two mentioned above?

Thanks,
Ming