[PATCH][RFC] x86/irq: Spread vectors on different CPUs

From: Chen Yu
Date: Sat May 13 2017 - 08:42:15 EST


Currently we encountered the CPU offline problem on a 16 cores server
when doing hibernation:

CPU 31 disable failed: CPU has 62 vectors assigned and there are only 0 available.

This is because:
1. One of the drivers has declare many vector resource via
pci_enable_msix_range(), say, this driver might likely want
to reserve 6 per logical CPU, then there would be 192 of them.
2. Besides, most of the vectors for this driver are allocated
on CPU0 due to the current code strategy, so there would be
insufficient slots left on CPU0 to receive any migrated
IRQs from the other CPUs during CPU offine.
3. Furthermore, many vectors this driver reserved do no have
any IRQ handler attached.

As a result, all vectors on CPU0 were used out and the last alive
CPU (31) failed to migrate its IRQs to the CPU0.

As we might have difficulty to reduce the number of vectors reserved
by that driver, there could be a compromising solution that, to spread
the vector allocation on different CPUs rather than always choosing
the *first* CPU in the cpumask. In this way, there would be a balanced
vector distribution. Because many vectors reserved but without used(point 3
above) will not be counted in during CPU offline, and they are now
on nonboot CPUs this problem will be solved.

Here's the trial version of this proposal and it works in my case,
it just tries to find the target CPU with the least vectors allocated(AKA,
the 'idlest' CPU). And the algorithm can be optimized but
firstly I'd like to get suggestions from you experts if this is in
the right direction, or what is the proper solution for such kind
of problems? Any comments/suggestions are appreciated.

Reported-by: Xiang Li <xiang.z.li@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "Anvin, H Peter" <h.peter.anvin@xxxxxxxxx>
Cc: "Van De Ven, Arjan" <arjan.van.de.ven@xxxxxxxxx>
Cc: "Brown, Len" <len.brown@xxxxxxxxx>
Cc: "Wysocki, Rafael J" <rafael.j.wysocki@xxxxxxxxx>
Cc: x86@xxxxxxxxxx
Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
---
arch/x86/kernel/apic/vector.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f3557a1..d220365 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -102,6 +102,27 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
}

+static int pick_leisure_cpu(const struct cpumask *mask)
+{
+ int cpu, vector;
+ int min_nr_vector = NR_VECTORS;
+ int target_cpu = cpumask_first_and(mask, cpu_online_mask);
+
+ for_each_cpu_and(cpu, mask, cpu_online_mask) {
+ int nr_vectors = 0;
+
+ for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
+ if (!IS_ERR_OR_NULL(per_cpu(vector_irq, cpu)[vector]))
+ nr_vectors++;
+ }
+ if (nr_vectors < min_nr_vector) {
+ min_nr_vector = nr_vectors;
+ target_cpu = cpu;
+ }
+ }
+ return target_cpu;
+}
+
static int __assign_irq_vector(int irq, struct apic_chip_data *d,
const struct cpumask *mask)
{
@@ -131,7 +152,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
cpumask_clear(d->old_domain);
cpumask_clear(searched_cpumask);
- cpu = cpumask_first_and(mask, cpu_online_mask);
+ cpu = pick_leisure_cpu(mask);
while (cpu < nr_cpu_ids) {
int new_cpu, offset;

--
2.7.4