[PATCH v11 00/13] blk: honor isolcpus configuration

From: Aaron Tomlin

Date: Thu Apr 16 2026 - 15:30:57 EST


Hi,

I have decided to drive this series forward on behalf of Daniel Wagner, the
original author. This iteration addresses the outstanding architectural and
concurrency concerns raised during the previous review cycle, and the series
has been rebased on v7.0-rc5-509-g545475aebc2a.

Building upon prior iterations, this series introduces critical
architectural refinements to the mapping and affinity spreading algorithms
to guarantee thread safety and resilience against concurrent CPU-hotplug
operations. Previously, the block layer relied on a shared global static
mask (i.e., blk_hk_online_mask), which proved vulnerable to race conditions
during rapid hotplug events. This vulnerability was recently highlighted by
the kernel test robot, which encountered a NULL pointer dereference during
rcutorture (cpuhotplug) stress testing due to concurrent mask modification.

To resolve this, the architecture has been fundamentally hardened. The
global static state has been eradicated. Instead, the IRQ affinity core now
employs a newly introduced irq_spread_hk_filter(), which safely intersects
the natively calculated affinity mask with the HK_TYPE_IO_QUEUE mask.
Crucially, this is achieved using a local, hotplug-safe snapshot via
data_race(cpu_online_mask). This approach circumvents the hotplug lock
deadlocks previously identified by Thomas Gleixner, while explicitly
avoiding CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count
systems. A robust fallback mechanism guarantees that should an interrupt
vector be assigned exclusively to isolated cores, it is safely re-routed to
the system's online housekeeping CPUs.

Following rigorous testing of multiple queue maps (such as NVMe poll
queues) alongside isolated CPUs, the tenth iteration resolved a critical
page fault regression. The multi-queue mapping logic has been corrected to
strictly maintain absolute hardware queue indices, ensuring faultless queue
initialisation and preventing out-of-bounds memory access.

Furthermore, following feedback from Ming Lei, the administrative
documentation for isolcpus=io_queue has undergone a comprehensive overhaul
to reflect this architectural reality. Previous iterations lacked the
required technical precision regarding subsystem impact. The expanded
kernel-parameters.txt now explicitly details that this parameter applies
strictly to managed IRQs. It thoroughly documents how the block layer
intercepts multiqueue allocation to match the housekeeping mask, actively
preventing MSI-X vector exhaustion on massive topologies and forcing queue
sharing. Most importantly, it cements the structural guarantee: while an
application on an isolated CPU may freely submit I/O, the hardware
completion interrupt is strictly and safely offloaded to a housekeeping
core.

Please let me know your thoughts.


Aaron Tomlin (1):
genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs

Daniel Wagner (12):
scsi: aacraid: use block layer helpers to calculate num of queues
lib/group_cpus: remove dead !SMP code
lib/group_cpus: Add group_mask_cpus_evenly()
genirq/affinity: Add cpumask to struct irq_affinity
blk-mq: add blk_mq_{online|possible}_queue_affinity
nvme-pci: use block layer helpers to constrain queue affinity
scsi: Use block layer helpers to constrain queue affinity
virtio: blk/scsi: use block layer helpers to constrain queue affinity
isolation: Introduce io_queue isolcpus type
blk-mq: use hk cpus only when isolcpus=io_queue is enabled
blk-mq: prevent offlining hk CPUs with associated online isolated CPUs
docs: add io_queue flag to isolcpus

.../admin-guide/kernel-parameters.txt | 30 ++-
block/blk-mq-cpumap.c | 192 ++++++++++++++++--
block/blk-mq.c | 42 ++++
drivers/block/virtio_blk.c | 4 +-
drivers/nvme/host/pci.c | 1 +
drivers/scsi/aacraid/comminit.c | 3 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 +
drivers/scsi/megaraid/megaraid_sas_base.c | 5 +-
drivers/scsi/mpi3mr/mpi3mr_fw.c | 6 +-
drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +-
drivers/scsi/pm8001/pm8001_init.c | 1 +
drivers/scsi/virtio_scsi.c | 5 +-
include/linux/blk-mq.h | 2 +
include/linux/group_cpus.h | 3 +
include/linux/interrupt.h | 16 +-
include/linux/sched/isolation.h | 1 +
kernel/irq/affinity.c | 38 +++-
kernel/sched/isolation.c | 7 +
lib/group_cpus.c | 65 ++++--
19 files changed, 379 insertions(+), 48 deletions(-)


base-commit: 3cd8b194bf3428dfa53120fee47e827a7c495815
--
2.51.0