[PATCH] x86/irq: Correct counting of irqs to be moved before CPU down

From: Gregory Herrero
Date: Mon May 02 2016 - 12:36:53 EST


Prior a cpu is downed, each irq to be moved to another cpu are counted.

The amount of irqs to be moved is correct before first cpu is down.
But on next cpu down, the count is wrong.

This wrong irqs count can prevent cpu from being downed if it is higher
than available vectors on other cpus.

Checking if affinity_new is a subset of online_new is wrong because
irq affinity mask is not updated when a cpu is downed.
So, affinity_new can never be a subset of online_new after one cpu has
been disabled.

Below example demonstrates the issue by disabling 2 cores out of 8.
It assumes irq affinity is set on all cores and all cpus are up.

affinity = irq affinity
affinity_new = irq affinity - downed cpu
online_new = online cpus - downed cpu

echo 0 > cpu1/online
affinity = 0-7
affinity_new = 0,2-7
online_new = 0,2-7
affinity_new > 0 and cpumask_subset(&affinity_new, &online_new)
is true.
There is no irq to be moved.

echo 0 > cpu2/online
affinity = 0-7
affinity_new = 0-1,3-7
online_new = 0,3-7
affinity_new > 0 and cpumask_subset(&affinity_new, &online_new)
is false.
affinity_new mask still contains cpu 1.
All irqs are counted whereas it should not.

Checking if there is at least one common cpu between online_new and
affinity_new masks corrects the counting.

In fact, if at least one cpu from affinity mask is still up, then this
cpu will be able to handle the irq. There is no need to modify irq
affinity.

Suggested-by: Vincent Stehlà <vincent.stehle@xxxxxxxxx>
Signed-off-by: Gregory Herrero <gregory.herrero@xxxxxxxxx>
---
arch/x86/kernel/irq.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 61521dc..65ee7ac 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -387,13 +387,12 @@ int check_irq_vectors_for_cpu_disable(void)
* this the down'd cpu is the last cpu in the irq's
* affinity mask, or
*
- * 2) the resulting affinity mask is no longer a
- * subset of the online cpus but the affinity mask is
+ * 2) the resulting affinity mask no longer has
+ * common bits with online cpus mask but the affinity mask is
* not zero; that is the down'd cpu is the last online
* cpu in a user set affinity mask.
*/
- if (cpumask_empty(&affinity_new) ||
- !cpumask_subset(&affinity_new, &online_new))
+ if (cpumask_any_and(&affinity_new, &online_new) >= nr_cpu_ids)
this_count++;
}

--
2.7.0