Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console

From: Thomas Gleixner
Date: Wed Dec 09 2020 - 13:29:38 EST


Hi Shung-Hsi!

On Wed, Dec 09 2020 at 14:33, Shung-Hsi Yu wrote:
> On Tue, Nov 10, 2020 at 09:56:27PM +0100, Thomas Gleixner wrote:
>> The real problem is irqbalanced aggressively exhausting the vector space
>> of a _whole_ socket to the point that there is not a single vector left
>> for serial. That's the problem you want to fix.
>
> I believe this warning also gets triggered even when there's _no_ vector
> exhaustion.
>
> This seem to happen when the IRQ's affinity mask is set (wrongly) to CPUs on
> a different NUMA node (e.g. cpumask_of_node(1) when the irqd->irq == 0).
>
> $ lscpu
> ...
> NUMA node0 CPU(s): 0-25,52-77
> NUMA node1 CPU(s): 26-51,78-103
>
> $ cat /sys/kernel/debug/tracing/trace
> ...
> (agetty)-3004 [047] d... 81.777152: vector_activate: irq=4 is_managed=0 can_reserve=1 reserve=0
> (agetty)-3004 [047] d... 81.777157: vector_alloc: irq=4 vector=0 reserved=1 ret=-22
> ----------------------------------------> irq_matrix_alloc() failed with
> EINVAL because the cpumask
> passed in is empty, which is a
> result of affmask being
> (ff,ffffc000,000fffff,fc000000)
> and cpumask_of_node(node)
> being
> (00,00003fff,fff00000,03ffffff).
>
> (agetty)-3004 [047] d... 81.789349: irq_matrix_alloc: bit=33 cpu=1 online=1 avl=199 alloc=2 managed=1 online_maps=104 global_avl=20688, global_rsvd=341, total_alloc=216
> (agetty)-3004 [047] d... 81.789351: vector_alloc: irq=4 vector=33 reserved=1 ret=0
> (agetty)-3004 [047] d... 81.789353: vector_update: irq=4 vector=33 cpu=1 prev_vector=0 prev_cpu=26
> (agetty)-3004 [047] d... 81.789355: vector_config: irq=4 vector=33 cpu=1 apicdest=0x00000002
> ----------------------------------------> "irq 4: Affinity broken due to
> vector space exhaustion."
> warning shows up
>

Ok. That's a different story. Nice explanation!

But the fix is not to tone down the warning. The proper fix is to do the
search in the correct order.

Thanks,

tglx
---
arch/x86/kernel/apic/vector.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -273,20 +273,24 @@ static int assign_irq_vector_any_locked(
const struct cpumask *affmsk = irq_data_get_affinity_mask(irqd);
int node = irq_data_get_node(irqd);

- if (node == NUMA_NO_NODE)
- goto all;
- /* Try the intersection of @affmsk and node mask */
- cpumask_and(vector_searchmask, cpumask_of_node(node), affmsk);
- if (!assign_vector_locked(irqd, vector_searchmask))
- return 0;
- /* Try the node mask */
- if (!assign_vector_locked(irqd, cpumask_of_node(node)))
- return 0;
-all:
+ if (node != NUMA_NO_NODE) {
+ /* Try the intersection of @affmsk and node mask */
+ cpumask_and(vector_searchmask, cpumask_of_node(node), affmsk);
+ if (!assign_vector_locked(irqd, vector_searchmask))
+ return 0;
+ }
+
/* Try the full affinity mask */
cpumask_and(vector_searchmask, affmsk, cpu_online_mask);
if (!assign_vector_locked(irqd, vector_searchmask))
return 0;
+
+ if (node != NUMA_NO_NODE) {
+ /* Try the node mask */
+ if (!assign_vector_locked(irqd, cpumask_of_node(node)))
+ return 0;
+ }
+
/* Try the full online mask */
return assign_vector_locked(irqd, cpu_online_mask);
}