[patch V2 00/13] x86/irq/64: Inline irq stack switching

From: Thomas Gleixner
Date: Tue Feb 09 2021 - 21:09:29 EST


This is the second version of this series. V1 is available here:

https://lore.kernel.org/r/20210204204903.350275743@xxxxxxxxxxxxx

The recent effort to make the ASM entry code slim and unified moved
the irq stack switching out of the low level ASM code so that the
whole return from interrupt work and state handling can be done in C
and the ASM code just handles the true low level details of entry and
exit (which is horrible enough already due to the well thought out
architeture).

The main goal at this point was to get instrumentation and RCU state
under control in a validated way. Inlining the switch mechanism was
attempted back then, but that caused more objtool and unwinder trouble
than we had already on our plate, so we ended up with a simple,
functional but suboptimal implementation. The main issues are:

- The unnecessary indirect call which is expensive thanks to
retpoline

- The inability to stay on the irq stack for softirq processing on return
from interrupt which requires another stack switch operation.

- The fact that the stack switching code ended up being an easy to find
exploit gadget.

This series revisits the problem and reimplements the stack switch
mechanics via evil inline assembly. Peter Zijlstra provided the required
objtool and unwinder changes already. These are available here:

https://lore.kernel.org/r/20210203120222.451068583@xxxxxxxxxxxxx

and the latest iteration of them is available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core

The full series based on Peter's git branch is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/entry

All function calls are now direct and fully inlined including the single
instance in the softirq code which is invoked from local_bh_enable() in
task context.

The extra 100 lines in the diffstat are pretty much the extensive commentry
for the whole magic to spare everyone including myself to scratch heads 2
weeks down the road.

The text size impact is in the noise and looking at the actual entry
functions there is depending on the compiler variant even a small size
decrease.

The patches have been tested with gcc8, gcc10 and clang-13 (fresh from
git). The difference between the output of these compilers is minimal.
gcc8 being slightly worse due to stupid register selection and random
NOPs injected.

Changes vs. V1:

- Use ASM_CALL_CONSTRAINT unconditionally (Josh)
- New approach to handle the inlining without the extra #ifdeffery (Lai)
- Added stable/fixes tag to patch 1 (Boris)
- Style and comment updates (Boris)
- Clarified the cacheline effect in the changelog (Peter)
- Picked up Reviewed-by from Kees where appropriate


Thanks,

tglx
---
arch/Kconfig | 6
arch/parisc/Kconfig | 1
arch/parisc/include/asm/hardirq.h | 4
arch/parisc/kernel/irq.c | 1
arch/powerpc/Kconfig | 1
arch/powerpc/include/asm/irq.h | 2
arch/powerpc/kernel/irq.c | 1
arch/s390/Kconfig | 1
arch/s390/include/asm/hardirq.h | 1
arch/s390/kernel/irq.c | 1
arch/sh/Kconfig | 1
arch/sh/include/asm/irq.h | 1
arch/sh/kernel/irq.c | 1
arch/sparc/Kconfig | 1
arch/sparc/include/asm/irq_64.h | 1
arch/sparc/kernel/irq_64.c | 1
arch/x86/Kconfig | 2
arch/x86/entry/common.c | 19 --
arch/x86/entry/entry_64.S | 41 -----
arch/x86/include/asm/idtentry.h | 11 -
arch/x86/include/asm/irq.h | 2
arch/x86/include/asm/irq_stack.h | 279 ++++++++++++++++++++++++-----------
arch/x86/include/asm/processor.h | 9 -
arch/x86/include/asm/softirq_stack.h | 11 +
arch/x86/kernel/apic/apic.c | 31 ++-
arch/x86/kernel/cpu/common.c | 4
arch/x86/kernel/dumpstack_64.c | 22 ++
arch/x86/kernel/irq.c | 2
arch/x86/kernel/irq_32.c | 1
arch/x86/kernel/irq_64.c | 12 -
arch/x86/kernel/process_64.c | 2
include/asm-generic/Kbuild | 1
include/asm-generic/softirq_stack.h | 14 +
include/linux/interrupt.h | 9 -
kernel/softirq.c | 2
35 files changed, 303 insertions(+), 196 deletions(-)