Re: [patch 00/52] x86: Rework the vector management

From: Yu Chen
Date: Tue Sep 19 2017 - 05:10:33 EST


On Wed, Sep 13, 2017 at 11:29:02PM +0200, Thomas Gleixner wrote:
> Sorry for the large CC list, but this is a major surgery.
>
> The vector management in x86 including the surrounding code is a
> conglomorate of ancient bits and pieces which have been subject to
> 'modernization' and featuritis over the years. The most obscure parts are
> the vector allocation mechanics, the cleanup vector handling and the cpu
> hotplug machinery. Replacing these pieces of art was on my todo list for a
> long time.
>
> Recent attempts to 'solve' CPU offline / hibernation issues which are
> partially caused by the current vector management implementation made me
> look for real. Further information in this thread:
>
> http://lkml.kernel.org/r/cover.1504235838.git.yu.c.chen@xxxxxxxxx
>
> Aside of drivers allocating gazillion of interrupts, there are quite some
> things which can be addressed in the x86 vector management and in the core
> code.
>
> - Multi CPU affinities:
>
> A dubious property which is not available on all machines and causes
> major complexity both in the allocator and the cleanup/hotplug
> management. See:
>
> http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos
>
> - Priority level spreading:
>
> An obscure and undocumented property which I think is sufficiently
> argued to be not required in:
>
> http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos
>
> - Allocation of vectors when interrupt descriptors are allocated.
>
> This is a historical implementation detail, which is not really
> required when the vector allocation is delayed up to the point when
> request_irq() is invoked. This might make request_irq() fail, when the
> vector space is exhausted, but drivers should handle request_irq()
> fails anyway.
>
> The upside of changing this is that the active vector space becomes
> smaller especially on hibernation/cpu offline when drivers shut down
> queue interrupts of outgoing CPUs.
>
> Some of this is already addressed with the managed interrupt facility,
> but that was bolted on top of the existing vector management because
> proper integration was not possible at that point. I take the blame
> for this, but the tradeoff of not doing it would have been more
> broken driver boiler plate code all over the place. So I went for the
> lesser of two evils.
>
> - Allocation of vectors on the wrong place
>
> Even for managed interrupts the vector allocation at descriptor
> allocation happens on the wrong place and gets fixed after the fact
> with a call to set_affinity(). In case of not remapped interrupts
> this results in at least one interrupt on the wrong CPU before it is
> migrated to the desired target.
>
> - Lack of instrumentation
>
> All of this is a black box which allows no insight into the actual
> vector usage.
>
> The series addresses these points and converts the x86 vector management to
> a bitmap based allocator which provides proper reservation management for
> 'managed interrupts' and best effort reservation for regular interrupts.
> The latter allows overcommitment, which 'fixes' some of hotplug/hibernation
> problems in a clean way. It can't fix all of them depending on the driver
> involved.
>
> This rework is no excuse for driver writers to do exhaustive vector
> allocations instead of utilizing the managed interrupt infrastructure, but
> it addresses long standing issues in this code with the side effect of
> mitigating some of the driver oddities. The proper solution for multi queue
> management are 'managed interrupts' which has been proven in the block-mq
> work as they solve issues which are worked around in other drivers in
> creative ways with lots of copied code and often enough broken attempts to
> handle interrupt affinity and CPU hotplug problems.
>
> The new bitmap allocator and the x86 vector management code are
> instrumented with tracepoints and the irq domain debugfs files allow deep
> insight into the vector allocation and reservations.
>
> The patches work on machines with and without interrupt remapping and
> inside of KVM guests of various flavours, though I have no idea what I
> broke on the way with other hypervisors, posted interrupts etc. So I kindly
> ask for your support in testing and review.
>
> The series applies on top of Linus tree and is available as git branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/apic
>
> Note, that this branch is Linus tree plus scheduler and x86 fixes which I
> required to do proper testing. They have outstanding pull requests and
> might be merged already when you read this.
>
> Thanks,
>
> tglx
> ---
Tested on top of:
commit e1b476ae32fcfa59fc6752b4b01988e759269dc3
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Thu Sep 14 09:53:10 2017 +0200

x86/vector: Exclude IRQ0 from reservation mode

from branch WIP.x86/apic, on a platform with 16 cores,
bootup okay, cpu[1-31] offline/online okay.
Before offline:

name: VECTOR
size: 0
mapped: 484
flags: 0x00000041
Online bitmaps: 32
Global available: 6419
Global reserved: 407
Total allocated: 77
System: 41: 0-19,32,50,128,238-255
| CPU | avl | man | act | vectors
0 126 0 77 33-49,51-110
1 203 0 0
2 203 0 0
3 203 0 0
4 203 0 0
5 203 0 0
6 203 0 0
7 203 0 0
8 203 0 0
9 203 0 0
10 203 0 0
11 203 0 0
12 203 0 0
13 203 0 0
14 203 0 0
15 203 0 0
16 203 0 0
17 203 0 0
18 203 0 0
19 203 0 0
20 203 0 0
21 203 0 0
22 203 0 0
23 203 0 0
24 203 0 0
25 203 0 0
26 203 0 0
27 203 0 0
28 203 0 0
29 203 0 0
30 203 0 0
31 203 0 0

After offline:

name: VECTOR
size: 0
mapped: 484
flags: 0x00000041
Online bitmaps: 1
Global available: 126
Global reserved: 407
Total allocated: 77
System: 41: 0-19,32,50,128,238-255
| CPU | avl | man | act | vectors
0 126 0 77 33-49,51-110

Thanks,
Yu