RE: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts based on allocated IRQs
From: Long Li
Date: Thu Nov 01 2018 - 12:39:25 EST
> Subject: Re: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts
> based on allocated IRQs
>
> Long,
>
> On Thu, 1 Nov 2018, Long Li wrote:
> > On a large system with multiple devices of the same class (e.g. NVMe
> > disks, using managed IRQs), the kernel tends to concentrate their IRQs
> > on several CPUs.
> >
> > The issue is that when NVMe calls irq_matrix_alloc_managed(), the
> > assigned CPU tends to be the first several CPUs in the cpumask,
> > because they check for
> > cpumap->available that will not change after managed IRQs are reserved.
> >
> > In irq_matrix->cpumap, "available" is set when IRQs are allocated
> > earlier in the IRQ allocation process. This value is caculated based
> > on
>
> calculated
>
> > 1. how many unmanaged IRQs are allocated on this CPU 2. how many
> > managed IRQs are reserved on this CPU
> >
> > But "available" is not accurate in accouting the real IRQs load on a given CPU.
> >
> > For a managed IRQ, it tends to reserve more than one CPU, based on
> > cpumask in irq_matrix_reserve_managed. But later when actually
> > allocating CPU for this IRQ, only one CPU is allocated. Because
> > "available" is calculated at the time managed IRQ is reserved, it
> > tends to indicate a CPU has more IRQs than it's actually assigned.
> >
> > When a managed IRQ is assigned to a CPU in irq_matrix_alloc_managed(),
> > it decreases "allocated" based on the actually assignment of this IRQ to this
> CPU.
>
> decreases?
>
> > Unmanaged IRQ also decreases "allocated" after allocating an IRQ on this
> CPU.
>
> ditto
>
> > For this reason, checking "allocated" is more accurate than checking
> > "available" for a given CPU, and result in a more evenly distributed
> > IRQ across all CPUs.
>
> Again, this approach is only correct for managed interrupts. Why?
>
> Assume that total vector space size = 10
>
> CPU 0:
> allocated = 8
> available = 1
>
> i.e. there are 2 managed reserved, but not assigned interrupts
>
> CPU 1:
> allocated = 7
> available = 0
>
> i.e. there are 3 managed reserved, but not assigned interrupts
>
> Now allocate a non managed interrupt:
>
> irq_matrix_alloc()
>
> cpu = find_best_cpu() <-- returns CPU1
>
> ---> FAIL
>
> The allocation fails because it cannot allocate from the managed reserved
> space. The managed reserved space is guaranteed even if the vectors are not
> assigned. This is required to make hotplug work and to allow late activation
> without breaking the guarantees.
>
> Non managed has no guarantees, it's a best effort approach, so it can fail.
> But the fail above is just wrong.
>
> You really need to treat managed and unmanaged CPU selection differently.
Thank you for the explanation. I will send another patch to do it properly.
Long
>
> Thanks,
>
> tglx