Re: [PATCH v10 13/13] docs: add io_queue flag to isolcpus
From: Ming Lei
Date: Mon Apr 13 2026 - 11:18:41 EST
On Sun, Apr 12, 2026 at 06:50:33PM -0400, Aaron Tomlin wrote:
> On Sat, Apr 11, 2026 at 08:52:00PM +0800, Ming Lei wrote:
> > > The critical issue lies at the invocation of group_cpus_evenly(). Without
> > > this patchset, the core logic lacks the necessary constraints to respect
> > > CPU isolation. It is entirely possible, and indeed happens in practice, for
> > > an isolated CPU to be assigned to a CPU mask group.
> >
> > It is one bug report? No, because it doesn't show any trouble from user
> > viewpoint.
>
> Hi Ming,
>
> The lack of a formal bug report does not negate the fact that the current
> behaviour silently breaks the fundamental contract of CPU isolation from
> the administrator's perspective.
>
> To illustrate the user-visible impact, the following demonstrates the
> difference between relying on isolcpus=managed_irq and isolcpus=io_queue
> under 7.0.0-rc3-00065-gd80965e205a5, which includes this series.
>
> The Broadcom MPI3 Storage Controller driver allocates a full complement of
> 48 operational queue pairs. Consequently, a number of MSI-X vectors are
> generated and mapped directly onto the isolated cores thereby breaching
> isolation.
>
> # uname -r
> 7.0.0-rc3-00065-gd80965e205a5
>
> # tr ' ' '\n' < /proc/cmdline | grep isolcpus=
> isolcpus=managed_irq,domain,2-47
>
> # cat /sys/devices/system/cpu/isolated
> 2-47
>
> # dmesg | grep -A 6 'MSI-X vectors supported:'
> [ 2.981705] mpi3mr0: MSI-X vectors supported: 128, no of cores: 48,
> [ 2.981705] mpi3mr0: MSI-X vectors requested: 49 poll_queues 0
> [ 3.001915] mpi3mr0: trying to create 48 operational queue pairs
> [ 3.011214] mpi3mr0: allocating operational queues through segmented queues
> [ 3.101903] mpi3mr0: successfully created 48 operational queue pairs(default/polled) queue = (2/0)
> [ 3.111468] mpi3mr0: controller initialization completed successfully
>
> # awk '/mpi3mr0/ { print $1" "$NF }' /proc/interrupts
> 78: mpi3mr0-msix0
> 79: mpi3mr0-msix1
> 80: mpi3mr0-msix2
> 81: mpi3mr0-msix3
> 82: mpi3mr0-msix4
> 83: mpi3mr0-msix5
> 84: mpi3mr0-msix6
> 85: mpi3mr0-msix7
> 86: mpi3mr0-msix8
> 87: mpi3mr0-msix9
> 88: mpi3mr0-msix10
> 89: mpi3mr0-msix11
> 90: mpi3mr0-msix12
> ...
> 122: mpi3mr0-msix44
> 123: mpi3mr0-msix45
> 124: mpi3mr0-msix46
> 125: mpi3mr0-msix47
> 126: mpi3mr0-msix48
>
> # grep -H '' /proc/irq/{119,120,121,122}/{effective,smp}_affinity_list
> /proc/irq/119/effective_affinity_list:42
> /proc/irq/119/smp_affinity_list:42
> /proc/irq/120/effective_affinity_list:43
> /proc/irq/120/smp_affinity_list:43
> /proc/irq/121/effective_affinity_list:44
> /proc/irq/121/smp_affinity_list:44
> /proc/irq/122/effective_affinity_list:45
> /proc/irq/122/smp_affinity_list:45
But typical applications aren't supposed to submit IOs from these isolated CPUs, so
in reality, it isn't a big deal.
>
>
> Now with isolcpus=io_queue,2-47 the allocation is structurally restricted
> at the source. The driver creates only two operational queues, confining
> all resulting interrupts exclusively to housekeeping CPUs (0 and 1):
>
> # uname -r
> 7.0.0-rc3-00065-gd80965e205a5
>
> # tr ' ' '\n' < /proc/cmdline | grep isolcpus=
> isolcpus=io_queue,domain,2-47
>
> # cat /sys/devices/system/cpu/isolated
> 2-47
>
> # dmesg | grep -A 6 'MSI-X vectors supported:'
> [ 3.284850] mpi3mr0: MSI-X vectors supported: 128, no of cores: 48,
> [ 3.284851] mpi3mr0: MSI-X vectors requested: 49 poll_queues 0
> [ 3.305492] mpi3mr0: allocated vectors (3) are less than configured (49)
> [ 3.316528] mpi3mr0: trying to create 2 operational queue pairs
> [ 3.328013] mpi3mr0: allocating operational queues through segmented queues
> [ 3.340697] mpi3mr0: successfully created 2 operational queue pairs(default/polled) queue = (2/0)
> [ 3.350664] mpi3mr0: controller initialization completed successfully
>
> # awk '/mpi3mr0/ { print $1" "$NF }' /proc/interrupts
> 79: mpi3mr0-msix0
> 80: mpi3mr0-msix1
> 81: mpi3mr0-msix2
>
> # grep -H '' /proc/irq/{79,80,81}/{effective,smp}_affinity_list
> /proc/irq/79/effective_affinity_list:1
> /proc/irq/79/smp_affinity_list:1
> /proc/irq/80/effective_affinity_list:1
> /proc/irq/80/smp_affinity_list:1
> /proc/irq/81/effective_affinity_list:0
> /proc/irq/81/smp_affinity_list:0
>
> > Sebastian explains/shows how "isolcpus=managed_irq" works perfectly in the
> > following link:
> >
> > https://lore.kernel.org/all/20260401110232.ET5RxZfl@xxxxxxxxxxxxx/
> >
> > You have reviewed it...
> >
> > What matters is that IO won't interrupt isolated CPU.
>
> The isolcpus=managed_irq acts as a "best effort" avoidance algorithm rather
> than a strict, unbreakable constraint. This is indicated in the proposed
> changes to Documentation/core-api/irq/managed_irq.rst [1].
Yes, it is "best effort", but isolated cpu is only take as effective CPU
for the hw queue's irq iff all others are offline. Which is just fine for typical
use cases, in which IO isn't submitted from isolated CPU.
Thanks,
Ming