[PATCH v14 7/8] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs
From: Aaron Tomlin
Date: Wed May 20 2026 - 17:54:41 EST
At present, the managed interrupt spreading algorithm distributes vectors
across all available CPUs within a given node or system. On systems
employing CPU isolation (e.g., "isolcpus=io_queue"), this behaviour
defeats the primary purpose of isolation by routing hardware interrupts
(such as NVMe completion queues) directly to isolated cores.
Update irq_create_affinity_masks() to respect the housekeeping CPU mask.
By passing the HK_TYPE_IO_QUEUE mask directly to the topological
distribution function (group_mask_cpus_evenly()), we ensure that managed
interrupts are kept strictly off isolated CPUs.
This patch additionally addresses the architectural constraints of
restricted vector distribution:
1. Vector Limits and Overrides: Updated irq_calc_affinity_vectors()
to strictly bound the maximum number of allocated vectors to the
weight of the housekeeping mask. This correctly overrides
drivers providing a calc_sets() callback, preventing them from
wasting memory on dead hardware queues that cannot be routed to
isolated CPUs.
2. Multi-set Alignment and Leak Prevention: When isolation
constraints result in fewer available masks than requested
vectors for a given set, the remaining vector slots are padded
with the housekeeping mask. This replaces the historical
irq_default_affinity padding, ensuring excess managed queues do
not leak interrupts onto isolated CPUs.
3. Minimum Vector Safety Net: To prevent fatal -ENOSPC device probe
aborts on heavily isolated systems (where the housekeeping CPU
count might be lower than a device's structural minimum), the
final vector calculation is safeguarded to never drop below
minvec. Queues will safely share the available housekeeping CPUs
instead of failing the probe.
4. Zero Overhead: The housekeeping mask is conditionally assigned
via a direct pointer, completely avoiding temporary mask
allocations (e.g., alloc_cpumask_var) and bitwise operations
when CPU isolation is disabled. This guarantees zero performance
or memory overhead for standard configurations.
Signed-off-by: Aaron Tomlin <atomlin@xxxxxxxxxxx>
---
kernel/irq/affinity.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 78f2418a8925..dade92f8b4b3 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -8,6 +8,7 @@
#include <linux/slab.h>
#include <linux/cpu.h>
#include <linux/group_cpus.h>
+#include <linux/sched/isolation.h>
static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
{
@@ -25,8 +26,10 @@ static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
struct irq_affinity_desc *
irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
{
- unsigned int affvecs, curvec, usedvecs, i;
+ unsigned int affvecs, curvec, usedvecs, i, j;
struct irq_affinity_desc *masks = NULL;
+ const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+ bool hk_enabled = housekeeping_enabled(HK_TYPE_IO_QUEUE);
/*
* Determine the number of vectors which need interrupt affinities
@@ -70,19 +73,29 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
*/
for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
unsigned int nr_masks, this_vecs = affd->set_size[i];
- struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
+ struct cpumask *result;
+ const struct cpumask *mask;
+ if (hk_enabled)
+ mask = hk_mask;
+ else
+ mask = cpu_possible_mask;
+
+ result = group_mask_cpus_evenly(this_vecs, mask,
+ &nr_masks);
if (!result) {
kfree(masks);
return NULL;
}
-
- for (int j = 0; j < nr_masks; j++)
+ for (j = 0; j < nr_masks; j++)
cpumask_copy(&masks[curvec + j].mask, &result[j]);
+ for (j = nr_masks; j < this_vecs; j++)
+ cpumask_copy(&masks[curvec + j].mask, mask);
+
kfree(result);
- curvec += nr_masks;
- usedvecs += nr_masks;
+ curvec += this_vecs;
+ usedvecs += this_vecs;
}
/* Fill out vectors at the end that don't need affinity */
@@ -115,10 +128,12 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
if (resv > minvec)
return 0;
- if (affd->calc_sets)
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+ set_vecs = cpumask_weight(housekeeping_cpumask(HK_TYPE_IO_QUEUE));
+ else if (affd->calc_sets)
set_vecs = maxvec - resv;
else
set_vecs = cpumask_weight(cpu_possible_mask);
- return resv + min(set_vecs, maxvec - resv);
+ return max(minvec, resv + min(set_vecs, maxvec - resv));
}
--
2.51.0