[RFC 0/7] x86/percpu: Use segment qualifiers

From: Nadav Amit
Date: Thu Jul 18 2019 - 21:03:52 EST


GCC 6+ supports segment qualifiers. Using them allows to implement
several optimizations:

1. Avoid unnecessary instructions when an operation is carried on
read/written per-cpu value, and instead allow the compiler to set
instructions that access per-cpu value directly.

2. Make this_cpu_ptr() more efficient and allow its value to be cached,
since preemption must be disabled when this_cpu_ptr() is used.

3. Provide better alternative for this_cpu_read_stable() that caches
values more efficiently using alias attribute to const variable.

4. Allow the compiler to perform other optimizations (e.g. CSE).

5. Use rip-relative addressing in per_cpu_read_stable(), which make it
PIE-ready.

"size" and Peter's compare do not seem to show the impact on code size
reduction correctly. Summing the code size according to nm on defconfig
shows a minor reduction from 11349763 to 11339840 (0.09%).

Nadav Amit (7):
compiler: Report x86 segment support
x86/percpu: Use compiler segment prefix qualifier
x86/percpu: Use C for percpu accesses when possible
x86: Fix possible caching of current_task
percpu: Assume preemption is disabled on per_cpu_ptr()
x86/percpu: Optimized arch_raw_cpu_ptr()
x86/current: Aggressive caching of current

arch/x86/include/asm/current.h | 30 +++
arch/x86/include/asm/fpu/internal.h | 7 +-
arch/x86/include/asm/percpu.h | 293 +++++++++++++++++++------
arch/x86/include/asm/preempt.h | 3 +-
arch/x86/include/asm/resctrl_sched.h | 14 +-
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 7 +-
arch/x86/kernel/cpu/current.c | 16 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +-
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 4 +-
include/asm-generic/percpu.h | 12 +
include/linux/compiler-gcc.h | 4 +
include/linux/compiler.h | 2 +-
include/linux/percpu-defs.h | 33 ++-
15 files changed, 346 insertions(+), 88 deletions(-)
create mode 100644 arch/x86/kernel/cpu/current.c

--
2.17.1