Re: [PATCH v10 13/13] docs: add io_queue flag to isolcpus

From: Aaron Tomlin

Date: Wed Apr 15 2026 - 10:47:38 EST

On Wed, Apr 15, 2026 at 10:34:58AM +0200, Sebastian Andrzej Siewior wrote:
> On 2026-04-13 23:11:15 [+0800], Ming Lei wrote:
> > > > What matters is that IO won't interrupt isolated CPU.
> > >
> > > The isolcpus=managed_irq acts as a "best effort" avoidance algorithm rather
> > > than a strict, unbreakable constraint. This is indicated in the proposed
> > > changes to Documentation/core-api/irq/managed_irq.rst [1].
> >
> > Yes, it is "best effort", but isolated cpu is only take as effective CPU
> > for the hw queue's irq iff all others are offline. Which is just fine for typical
> > use cases, in which IO isn't submitted from isolated CPU.
>
> Couldn't we tackle this by limiting the number of managed interrupts the
> device asks for and then limiting the CPUs it could be bound to?
>
> So if have house keeping CPUs 0/1 and isolated 2-63 then managed_irq= is
> futile since it use 64 interrupts and map each to one CPU. Even if the
> device supports less it would map them evenly across available CPUs.
>
> If the user wishes to initiate I/O from all CPUs but not be bother by
> interrupts we could limit the device to ask for 2 interrupts instead of
> 64 (with the consequence of more queue sharing) and then limit those two
> interrupts to CPU 0 and 1 instead to CPU 0-31 and 32-63 like it would be
> now the case.
>
> Wouldn't that be what the io_queue flag tries to do?
>
Hi Sebastian,

Indeed, you are spot on.

What you have described is precisely the architectural mechanism that this
patchset implements to resolve the issue:

1. Rather than permitting the device driver to blindly allocate 64
queues (and 64 MSI-X vectors) for a 64-core system, the
"isolcpus=io_queue" intercepts this at the block layer. It
throttles the hardware queue allocation to match the number of
online housekeeping CPUs (2 queues in your example). As you rightly
noted, this results in the isolated CPUs sharing those submission
queues.

2. Once those two queues have been allocated, the new
irq_spread_hk_filter() strictly confines their hardware completion
interrupts to CPUs 0 and 1.

By structurally enforcing both of these constraints at initialisation,
"isolcpus=io_queue" entirely prevents the vector exhaustion observed on
large topologies. Furthermore, it provides an absolute guarantee that
hardware completion interrupts will never be routed to the isolated CPUs,
even when an application submits I/O from them.

Kind regards,
--
Aaron Tomlin

Attachment: signature.asc
Description: PGP signature