[PATCH 000/208] big x86 FPU code rewrite

From: Ingo Molnar
Date: Tue May 05 2015 - 13:51:59 EST


[Second part of the series - Gmail didn't like me sending so many mails.]

Over the past 10 years the x86 FPU has organically grown into
somewhat of a spaghetti monster that few (if any) kernel
developers understand and which code few people enjoy to hack.

Many people suggested over the years that it needs a major cleanup,
and some time ago I went "what the heck" and started doing it step
by step to see where it leads - it cannot be that hard!

Three weeks and 200+ patches later I think I have to admit that I
seriously underestimated the magnitude of the project! ;-)

This work in progress series is large, but it I think makes the
code maintainable and hackable again. It's pretty complete, as
per the 9 high level goals laid out further below. Individual
patches are all finegrained, so should be easy to review - Boris
Petkov already reviewed most of the patches so they are not
entirely raw.

Individual patches have been tested heavily for bisectability, they
were both build and boot on a relatively wide range of x86 hardware
that I have access to. But nevertheless the changes are pretty
invasive, so I'd expect there to be test failures.

This is the only time I intend to post them to lkml in their entirety,
to not spam lkml too much. (Future additions will be posted as delta
series.)

I'd like to ask interested people to test this tree, and to comment
on the patches. The changes can be found in the following Git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tmp.fpu

(The tree might be rebased, depending on feedback.)

Here are the main themes that motivated most of the changes:

1)

I collected all FPU code into arch/x86/kernel/fpu/*.c and split it
all up into the following, topically organized source code files:

-rw-rw-r-- 1 mingo mingo 1423 May 5 16:36 arch/x86/kernel/fpu/bugs.c
-rw-rw-r-- 1 mingo mingo 12206 May 5 16:36 arch/x86/kernel/fpu/core.c
-rw-rw-r-- 1 mingo mingo 7342 May 5 16:36 arch/x86/kernel/fpu/init.c
-rw-rw-r-- 1 mingo mingo 10909 May 5 16:36 arch/x86/kernel/fpu/measure.c
-rw-rw-r-- 1 mingo mingo 9012 May 5 16:36 arch/x86/kernel/fpu/regset.c
-rw-rw-r-- 1 mingo mingo 11188 May 5 16:36 arch/x86/kernel/fpu/signal.c
-rw-rw-r-- 1 mingo mingo 10140 May 5 16:36 arch/x86/kernel/fpu/xstate.c

Similarly I've collected and split up all FPU related header files, and
organized them topically:

-rw-rw-r-- 1 mingo mingo 1690 May 5 16:35 arch/x86/include/asm/fpu/api.h
-rw-rw-r-- 1 mingo mingo 12937 May 5 16:36 arch/x86/include/asm/fpu/internal.h
-rw-rw-r-- 1 mingo mingo 278 May 5 16:36 arch/x86/include/asm/fpu/measure.h
-rw-rw-r-- 1 mingo mingo 596 May 5 16:35 arch/x86/include/asm/fpu/regset.h
-rw-rw-r-- 1 mingo mingo 1013 May 5 16:35 arch/x86/include/asm/fpu/signal.h
-rw-rw-r-- 1 mingo mingo 8137 May 5 16:36 arch/x86/include/asm/fpu/types.h
-rw-rw-r-- 1 mingo mingo 5691 May 5 16:36 arch/x86/include/asm/fpu/xstate.h

<fpu/api.h> is the only 'public' API left, used in various drivers.

I decoupled drivers and non-FPU x86 code from various FPU internals.

2)

I renamed various internal data types, APIs and helpers, and organized its
support functions accordingly.

For example, all functions that deal with copying FPU registers in and
out of the FPU, are now named consistently:

copy_fxregs_to_kernel() # was: fpu_fxsave()
copy_xregs_to_kernel() # was: xsave_state()

copy_kernel_to_fregs() # was: frstor_checking()
copy_kernel_to_fxregs() # was: fxrstor_checking()
copy_kernel_to_xregs() # was: fpu_xrstor_checking()
copy_kernel_to_xregs_booting() # was: xrstor_state_booting()

copy_fregs_to_user() # was: fsave_user()
copy_fxregs_to_user() # was: fxsave_user()
copy_xregs_to_user() # was: xsave_user()

copy_user_to_fregs() # was: frstor_user()
copy_user_to_fxregs() # was: fxrstor_user()
copy_user_to_xregs() # was: xrestore_user()
copy_user_to_fpregs_zeroing() # was: restore_user_xstate()

'xregs' stands for registers supported by XSAVE
'fxregs' stands for registers supported by FXSAVE
'fregs' stands for registers supported by FSAVE
'fpregs' stands for generic FPU registers.

Similarly, the high level FPU functions got reorganized as well:

extern void fpu__activate_curr(struct fpu *fpu);
extern void fpu__activate_stopped(struct fpu *fpu);
extern void fpu__save(struct fpu *fpu);
extern void fpu__restore(struct fpu *fpu);
extern int fpu__restore_sig(void __user *buf, int ia32_frame);
extern void fpu__drop(struct fpu *fpu);
extern int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
extern void fpu__clear(struct fpu *fpu);
extern int fpu__exception_code(struct fpu *fpu, int trap_nr);

Those functions that used to take a task_struct argument now take
the more limited 'struct fpu' argument, and their naming is consistent
and logical as well.

Likewise, the FP state data types are now consistently named as well:

struct fregs_state;
struct fxregs_state;
struct swregs_state;
struct xregs_state;

union fpregs_state;

3)

Various core data types got streamlined around four byte flags in 'struct fpu':

fpu->fpstate_active # was: tsk->flags & PF_USED_MATH
fpu->fpregs_active # was: fpu->has_fpu
fpu->last_cpu
fpu->counter

which now fit into a single word.

4)

task->thread.fpu->state got embedded again, as task->thread.fpu.state. This
eliminated a lot of awkward late dynamic memory allocation of FPU state
and the problematic handling of failures.

Note that while the allocation is static right now, this is a WIP interim
state: we can still do dynamic allocation of FPU state, by moving the FPU
state last in task_struct and then allocating task_struct accordingly.

5)

The amazingly convoluted init dependencies got sorted out, into two
cleanly separated families of initialization functions: the
fpu__init_system_*() functions, and the fpu__init_cpu_*() functions.

This allowed the removal of various __init annotation hacks and
obscure boot time checks.

6)

Decoupled the FPU core from the save code. xsave.c and xsave.h got
shrunk quite a bit, and it now hosts only XSAVE/etc. related
functionality, not generic FPU handling functions.

7)

Added a ton of comments explaining how things works and why, hopefully
making this code accessible to everyone interested.

8)

Added FPU debugging code (CONFIG_X86_DEBUG_FPU=y) and added an FPU hw
benchmarking subsystem (CONFIG_X86_DEBUG_FPU_MEASUREMENTS=y), which
performs boot time measurements like:

x86/fpu:##################################################################
x86/fpu: Running FPU performance measurement suite (cache hot):
x86/fpu: Cost of: null : 108 cycles
x86/fpu:######## CPU instructions: ############################
x86/fpu: Cost of: NOP insn : 0 cycles
x86/fpu: Cost of: RDTSC insn : 12 cycles
x86/fpu: Cost of: RDMSR insn : 100 cycles
x86/fpu: Cost of: WRMSR insn : 396 cycles
x86/fpu: Cost of: CLI insn same-IF : 0 cycles
x86/fpu: Cost of: CLI insn flip-IF : 0 cycles
x86/fpu: Cost of: STI insn same-IF : 0 cycles
x86/fpu: Cost of: STI insn flip-IF : 0 cycles
x86/fpu: Cost of: PUSHF insn : 0 cycles
x86/fpu: Cost of: POPF insn same-IF : 20 cycles
x86/fpu: Cost of: POPF insn flip-IF : 28 cycles
x86/fpu:######## IRQ save/restore APIs: ############################
x86/fpu: Cost of: local_irq_save() fn : 20 cycles
x86/fpu: Cost of: local_irq_restore() fn same-IF : 24 cycles
x86/fpu: Cost of: local_irq_restore() fn flip-IF : 28 cycles
x86/fpu: Cost of: irq_save()+restore() fn same-IF : 48 cycles
x86/fpu: Cost of: irq_save()+restore() fn flip-IF : 48 cycles
x86/fpu:######## locking APIs: ############################
x86/fpu: Cost of: smp_mb() fn : 40 cycles
x86/fpu: Cost of: cpu_relax() fn : 8 cycles
x86/fpu: Cost of: spin_lock()+unlock() fn : 64 cycles
x86/fpu: Cost of: read_lock()+unlock() fn : 76 cycles
x86/fpu: Cost of: write_lock()+unlock() fn : 52 cycles
x86/fpu: Cost of: rcu_read_lock()+unlock() fn : 16 cycles
x86/fpu: Cost of: preempt_disable()+enable() fn : 20 cycles
x86/fpu: Cost of: mutex_lock()+unlock() fn : 56 cycles
x86/fpu:######## MM instructions: ############################
x86/fpu: Cost of: __flush_tlb() fn : 132 cycles
x86/fpu: Cost of: __flush_tlb_global() fn : 920 cycles
x86/fpu: Cost of: __flush_tlb_one() fn : 288 cycles
x86/fpu: Cost of: __flush_tlb_range() fn : 412 cycles
x86/fpu:######## FPU instructions: ############################
x86/fpu: Cost of: CR0 read : 4 cycles
x86/fpu: Cost of: CR0 write : 208 cycles
x86/fpu: Cost of: CR0::TS fault : 1156 cycles
x86/fpu: Cost of: FNINIT insn : 76 cycles
x86/fpu: Cost of: FWAIT insn : 0 cycles
x86/fpu: Cost of: FSAVE insn : 168 cycles
x86/fpu: Cost of: FRSTOR insn : 160 cycles
x86/fpu: Cost of: FXSAVE insn : 84 cycles
x86/fpu: Cost of: FXRSTOR insn : 44 cycles
x86/fpu: Cost of: FXRSTOR fault : 688 cycles
x86/fpu: Cost of: XSAVE insn : 104 cycles
x86/fpu: Cost of: XRSTOR insn : 80 cycles
x86/fpu: Cost of: XRSTOR fault : 884 cycles
x86/fpu:##################################################################

Based on such measurements we'll be able to do performance tuning,
set default policies and do optimizations in a more informed fashion,
as the speed of various x86 hardware varies a lot.

9)

Reworked many ancient inlining and uninlining decisions based on
modern principles.


Any feedback is welcome!

Thanks,

Ingo

=====
Ingo Molnar (208):
x86/fpu: Rename unlazy_fpu() to fpu__save()
x86/fpu: Add comments to fpu__save() and restrict its export
x86/fpu: Add debugging check to fpu__save()
x86/fpu: Rename fpu_detect() to fpu__detect()
x86/fpu: Remove stale init_fpu() prototype
x86/fpu: Split an fpstate_alloc_init() function out of init_fpu()
x86/fpu: Make init_fpu() static
x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check
x86/fpu: Optimize fpu__unlazy_stopped()
x86/fpu: Simplify fpu__unlazy_stopped()
x86/fpu: Remove fpu_allocated()
x86/fpu: Move fpu_alloc() out of line
x86/fpu: Rename fpu_alloc() to fpstate_alloc()
x86/fpu: Rename fpu_free() to fpstate_free()
x86/fpu: Rename fpu_finit() to fpstate_init()
x86/fpu: Rename fpu_init() to fpu__cpu_init()
x86/fpu: Rename init_thread_xstate() to fpstate_xstate_init_size()
x86/fpu: Move thread_info::fpu_counter into thread_info::fpu.counter
x86/fpu: Improve the comment for the fpu::counter field
x86/fpu: Move FPU data structures to asm/fpu_types.h
x86/fpu: Clean up asm/fpu/types.h
x86/fpu: Move i387.c and xsave.c to arch/x86/kernel/fpu/
x86/fpu: Fix header file dependencies of fpu-internal.h
x86/fpu: Split out the boot time FPU init code into fpu/init.c
x86/fpu: Remove unnecessary includes from core.c
x86/fpu: Move the no_387 handling and FPU detection code into init.c
x86/fpu: Remove the free_thread_xstate() complication
x86/fpu: Factor out fpu__flush_thread() from flush_thread()
x86/fpu: Move math_state_restore() to fpu/core.c
x86/fpu: Rename math_state_restore() to fpu__restore()
x86/fpu: Factor out the FPU bug detection code into fpu__init_check_bugs()
x86/fpu: Simplify the xsave_state*() methods
x86/fpu: Remove fpu_xsave()
x86/fpu: Move task_xstate_cachep handling to core.c
x86/fpu: Factor out fpu__copy()
x86/fpu: Uninline fpstate_free() and move it next to the allocation function
x86/fpu: Make task_xstate_cachep static
x86/fpu: Make kernel_fpu_disable/enable() static
x86/fpu: Add debug check to kernel_fpu_disable()
x86/fpu: Add kernel_fpu_disabled()
x86/fpu: Remove __save_init_fpu()
x86/fpu: Move fpu_copy() to fpu/core.c
x86/fpu: Add debugging check to fpu_copy()
x86/fpu: Print out whether we are doing lazy/eager FPU context switches
x86/fpu: Eliminate the __thread_has_fpu() wrapper
x86/fpu: Change __thread_clear_has_fpu() to 'struct fpu' parameter
x86/fpu: Move 'PER_CPU(fpu_owner_task)' to fpu/core.c
x86/fpu: Change fpu_owner_task to fpu_fpregs_owner_ctx
x86/fpu: Remove 'struct task_struct' usage from __thread_set_has_fpu()
x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_end()
x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_begin()
x86/fpu: Open code PF_USED_MATH usages
x86/fpu: Document fpu__unlazy_stopped()
x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active
x86/fpu: Remove 'struct task_struct' usage from drop_fpu()
x86/fpu: Remove task_disable_lazy_fpu_restore()
x86/fpu: Use 'struct fpu' in fpu_lazy_restore()
x86/fpu: Use 'struct fpu' in restore_fpu_checking()
x86/fpu: Use 'struct fpu' in fpu_reset_state()
x86/fpu: Use 'struct fpu' in switch_fpu_prepare()
x86/fpu: Use 'struct fpu' in switch_fpu_finish()
x86/fpu: Move __save_fpu() into fpu/core.c
x86/fpu: Use 'struct fpu' in __fpu_save()
x86/fpu: Use 'struct fpu' in fpu__save()
x86/fpu: Use 'struct fpu' in fpu_copy()
x86/fpu: Use 'struct fpu' in fpu__copy()
x86/fpu: Use 'struct fpu' in fpstate_alloc_init()
x86/fpu: Use 'struct fpu' in fpu__unlazy_stopped()
x86/fpu: Rename fpu__flush_thread() to fpu__clear()
x86/fpu: Clean up fpu__clear() a bit
x86/fpu: Rename i387.h to fpu/api.h
x86/fpu: Move xsave.h to fpu/xsave.h
x86/fpu: Rename fpu-internal.h to fpu/internal.h
x86/fpu: Move MXCSR_DEFAULT to fpu/internal.h
x86/fpu: Remove xsave_init() __init obfuscation
x86/fpu: Remove assembly guard from asm/fpu/api.h
x86/fpu: Improve FPU detection kernel messages
x86/fpu: Print supported xstate features in human readable way
x86/fpu: Rename 'pcntxt_mask' to 'xfeatures_mask'
x86/fpu: Rename 'xstate_features' to 'xfeatures_nr'
x86/fpu: Move XCR0 manipulation to the FPU code proper
x86/fpu: Clean up regset functions
x86/fpu: Rename 'xsave_hdr' to 'header'
x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures'
x86/fpu: Clean up and fix MXCSR handling
x86/fpu: Rename regset FPU register accessors
x86/fpu: Explain the AVX register layout in the xsave area
x86/fpu: Improve the __sanitize_i387_state() documentation
x86/fpu: Rename fpu->has_fpu to fpu->fpregs_active
x86/fpu: Rename __thread_set_has_fpu() to __fpregs_activate()
x86/fpu: Rename __thread_clear_has_fpu() to __fpregs_deactivate()
x86/fpu: Rename __thread_fpu_begin() to fpregs_activate()
x86/fpu: Rename __thread_fpu_end() to fpregs_deactivate()
x86/fpu: Remove fpstate_xstate_init_size() boot quirk
x86/fpu: Remove xsave_init() bootmem allocations
x86/fpu: Make setup_init_fpu_buf() run-once explicitly
x86/fpu: Remove 'init_xstate_buf' bootmem allocation
x86/fpu: Split fpu__cpu_init() into early-boot and cpu-boot parts
x86/fpu: Make the system/cpu init distinction clear in the xstate code as well
x86/fpu: Move CPU capability check into fpu__init_cpu_xstate()
x86/fpu: Move legacy check to fpu__init_system_xstate()
x86/fpu: Propagate once per boot quirk into fpu__init_system_xstate()
x86/fpu: Remove xsave_init()
x86/fpu: Do fpu__init_system_xstate only from fpu__init_system()
x86/fpu: Set up the legacy FPU init image from fpu__init_system()
x86/fpu: Remove setup_init_fpu_buf() call from eager_fpu_init()
x86/fpu: Move all eager-fpu setup code to eager_fpu_init()
x86/fpu: Move eager_fpu_init() to fpu/init.c
x86/fpu: Clean up eager_fpu_init() and rename it to fpu__ctx_switch_init()
x86/fpu: Split fpu__ctx_switch_init() into _cpu() and _system() portions
x86/fpu: Do CLTS fpu__init_system()
x86/fpu: Move the fpstate_xstate_init_size() call into fpu__init_system()
x86/fpu: Call fpu__init_cpu_ctx_switch() from fpu__init_cpu()
x86/fpu: Do system-wide setup from fpu__detect()
x86/fpu: Remove fpu__init_cpu_ctx_switch() call from fpu__init_system()
x86/fpu: Simplify fpu__cpu_init()
x86/fpu: Factor out fpu__init_cpu_generic()
x86/fpu: Factor out fpu__init_system_generic()
x86/fpu: Factor out fpu__init_system_early_generic()
x86/fpu: Move !FPU check ingo fpu__init_system_early_generic()
x86/fpu: Factor out FPU bug checks into fpu/bugs.c
x86/fpu: Make check_fpu() init ordering independent
x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect()
x86/fpu: Remove the extra fpu__detect() layer
x86/fpu: Rename fpstate_xstate_init_size() to fpu__init_system_xstate_size_legacy()
x86/fpu: Reorder init methods
x86/fpu: Add more comments to the FPU init code
x86/fpu: Move fpu__save() to fpu/internals.h
x86/fpu: Uninline kernel_fpu_begin()/end()
x86/fpu: Move various internal function prototypes to fpu/internal.h
x86/fpu: Uninline the irq_ts_save()/restore() functions
x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)
x86/fpu: Remove failure paths from fpstate-alloc low level functions
x86/fpu: Remove failure return from fpstate_alloc_init()
x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr()
x86/fpu: Simplify fpu__unlazy_stopped() error handling
x86/fpu, kvm: Simplify fx_init()
x86/fpu: Simplify fpstate_init_curr() usage
x86/fpu: Rename fpu__unlazy_stopped() to fpu__activate_stopped()
x86/fpu: Factor out FPU hw activation/deactivation
x86/fpu: Simplify __save_fpu()
x86/fpu: Eliminate __save_fpu()
x86/fpu: Simplify fpu__save()
x86/fpu: Optimize fpu__save()
x86/fpu: Optimize fpu_copy()
x86/fpu: Optimize fpu_copy() some more on lazy switching systems
x86/fpu: Rename fpu/xsave.h to fpu/xstate.h
x86/fpu: Rename fpu/xsave.c to fpu/xstate.c
x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name)
x86/fpu: Simplify print_xstate_features()
x86/fpu: Enumerate xfeature bits
x86/fpu: Move xfeature type enumeration to fpu/types.h
x86/fpu, crypto x86/camellia_aesni_avx: Simplify the camellia_aesni_init() xfeature checks
x86/fpu, crypto x86/sha256_ssse3: Simplify the sha256_ssse3_mod_init() xfeature checks
x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks
x86/fpu, crypto x86/twofish_avx: Simplify the twofish_init() xfeature checks
x86/fpu, crypto x86/serpent_avx: Simplify the serpent_init() xfeature checks
x86/fpu, crypto x86/cast5_avx: Simplify the cast5_init() xfeature checks
x86/fpu, crypto x86/sha512_ssse3: Simplify the sha512_ssse3_mod_init() xfeature checks
x86/fpu, crypto x86/cast6_avx: Simplify the cast6_init() xfeature checks
x86/fpu, crypto x86/sha1_ssse3: Simplify the sha1_ssse3_mod_init() xfeature checks
x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks
x86/fpu, crypto x86/sha1_mb: Remove FPU internal headers from sha1_mb.c
x86/fpu: Move asm/xcr.h to asm/fpu/internal.h
x86/fpu: Rename sanitize_i387_state() to fpstate_sanitize_xstate()
x86/fpu: Simplify fpstate_sanitize_xstate() calls
x86/fpu: Pass 'struct fpu' to fpstate_sanitize_xstate()
x86/fpu: Rename save_xstate_sig() to copy_fpstate_to_sigframe()
x86/fpu: Rename save_user_xstate() to copy_fpregs_to_sigframe()
x86/fpu: Clarify ancient comments in fpu__restore()
x86/fpu: Rename user_has_fpu() to fpregs_active()
x86/fpu: Initialize fpregs in fpu__init_cpu_generic()
x86/fpu: Clean up fpu__clear() state handling
x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state()
x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs()
x86/fpu: Move all the fpu__*() high level methods closer to each other
x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing
x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig()
x86/fpu: Move the signal frame handling code closer to each other
x86/fpu: Merge fpu__reset() and fpu__clear()
x86/fpu: Move is_ia32*frame() helpers out of fpu/internal.h
x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions
x86/fpu: Factor out fpu/regset.h from fpu/internal.h
x86/fpu: Remove run-once init quirks
x86/fpu: Factor out the exception error code handling code
x86/fpu: Harmonize the names of the fpstate_init() helper functions
x86/fpu: Create 'union thread_xstate' helper for fpstate_init()
x86/fpu: Generalize 'init_xstate_ctx'
x86/fpu: Move restore_init_xstate() out of fpu/internal.h
x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions
x86/fpu: Factor out fpu/signal.c
x86/fpu: Factor out the FPU regset code into fpu/regset.c
x86/fpu: Harmonize FPU register state types
x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments
x86/fpu: Document the various fpregs state formats
x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin()
x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end
x86/fpu: Clean up xstate feature reservation
x86/fpu/xstate: Clean up setup_xstate_comp() call
x86/fpu/init: Propagate __init annotations
x86/fpu: Pass 'struct fpu' to fpu__restore()
x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT
x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code
x86/fpu: Add FPU performance measurement subsystem
x86/fpu: Reorganize fpu/internal.h

Documentation/preempt-locking.txt | 2 +-
arch/x86/Kconfig.debug | 27 ++
arch/x86/crypto/aesni-intel_glue.c | 2 +-
arch/x86/crypto/camellia_aesni_avx2_glue.c | 15 +-
arch/x86/crypto/camellia_aesni_avx_glue.c | 15 +-
arch/x86/crypto/cast5_avx_glue.c | 15 +-
arch/x86/crypto/cast6_avx_glue.c | 15 +-
arch/x86/crypto/crc32-pclmul_glue.c | 2 +-
arch/x86/crypto/crc32c-intel_glue.c | 3 +-
arch/x86/crypto/crct10dif-pclmul_glue.c | 2 +-
arch/x86/crypto/fpu.c | 2 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 2 +-
arch/x86/crypto/serpent_avx2_glue.c | 15 +-
arch/x86/crypto/serpent_avx_glue.c | 15 +-
arch/x86/crypto/sha-mb/sha1_mb.c | 5 +-
arch/x86/crypto/sha1_ssse3_glue.c | 16 +-
arch/x86/crypto/sha256_ssse3_glue.c | 16 +-
arch/x86/crypto/sha512_ssse3_glue.c | 16 +-
arch/x86/crypto/twofish_avx_glue.c | 16 +-
arch/x86/ia32/ia32_signal.c | 13 +-
arch/x86/include/asm/alternative.h | 6 +
arch/x86/include/asm/crypto/glue_helper.h | 2 +-
arch/x86/include/asm/efi.h | 2 +-
arch/x86/include/asm/fpu-internal.h | 626 ---------------------------------------
arch/x86/include/asm/fpu/api.h | 48 +++
arch/x86/include/asm/fpu/internal.h | 488 ++++++++++++++++++++++++++++++
arch/x86/include/asm/fpu/measure.h | 13 +
arch/x86/include/asm/fpu/regset.h | 21 ++
arch/x86/include/asm/fpu/signal.h | 33 +++
arch/x86/include/asm/fpu/types.h | 293 ++++++++++++++++++
arch/x86/include/asm/{xsave.h => fpu/xstate.h} | 60 ++--
arch/x86/include/asm/i387.h | 108 -------
arch/x86/include/asm/kvm_host.h | 2 -
arch/x86/include/asm/mpx.h | 8 +-
arch/x86/include/asm/processor.h | 141 +--------
arch/x86/include/asm/simd.h | 2 +-
arch/x86/include/asm/stackprotector.h | 2 +
arch/x86/include/asm/suspend_32.h | 2 +-
arch/x86/include/asm/suspend_64.h | 2 +-
arch/x86/include/asm/user.h | 12 +-
arch/x86/include/asm/xcr.h | 49 ---
arch/x86/include/asm/xor.h | 2 +-
arch/x86/include/asm/xor_32.h | 2 +-
arch/x86/include/asm/xor_avx.h | 2 +-
arch/x86/include/uapi/asm/sigcontext.h | 8 +-
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/alternative.c | 5 +
arch/x86/kernel/cpu/bugs.c | 57 +---
arch/x86/kernel/cpu/bugs_64.c | 2 +
arch/x86/kernel/cpu/common.c | 29 +-
arch/x86/kernel/fpu/Makefile | 11 +
arch/x86/kernel/fpu/bugs.c | 71 +++++
arch/x86/kernel/fpu/core.c | 509 +++++++++++++++++++++++++++++++
arch/x86/kernel/fpu/init.c | 288 ++++++++++++++++++
arch/x86/kernel/fpu/measure.c | 509 +++++++++++++++++++++++++++++++
arch/x86/kernel/fpu/regset.c | 356 ++++++++++++++++++++++
arch/x86/kernel/fpu/signal.c | 404 +++++++++++++++++++++++++
arch/x86/kernel/fpu/xstate.c | 406 +++++++++++++++++++++++++
arch/x86/kernel/i387.c | 656 ----------------------------------------
arch/x86/kernel/process.c | 52 +---
arch/x86/kernel/process_32.c | 15 +-
arch/x86/kernel/process_64.c | 13 +-
arch/x86/kernel/ptrace.c | 12 +-
arch/x86/kernel/signal.c | 38 ++-
arch/x86/kernel/smpboot.c | 3 +-
arch/x86/kernel/traps.c | 120 ++------
arch/x86/kernel/xsave.c | 724 ---------------------------------------------
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx.c | 5 +-
arch/x86/kvm/x86.c | 68 ++---
arch/x86/lguest/boot.c | 2 +-
arch/x86/lib/mmx_32.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 4 +-
arch/x86/math-emu/fpu_entry.c | 20 +-
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/mpx.c | 15 +-
arch/x86/power/cpu.c | 11 +-
arch/x86/xen/enlighten.c | 2 +-
drivers/char/hw_random/via-rng.c | 2 +-
drivers/crypto/padlock-aes.c | 2 +-
drivers/crypto/padlock-sha.c | 2 +-
drivers/lguest/x86/core.c | 12 +-
lib/raid6/x86.h | 2 +-
83 files changed, 3742 insertions(+), 2841 deletions(-)
delete mode 100644 arch/x86/include/asm/fpu-internal.h
create mode 100644 arch/x86/include/asm/fpu/api.h
create mode 100644 arch/x86/include/asm/fpu/internal.h
create mode 100644 arch/x86/include/asm/fpu/measure.h
create mode 100644 arch/x86/include/asm/fpu/regset.h
create mode 100644 arch/x86/include/asm/fpu/signal.h
create mode 100644 arch/x86/include/asm/fpu/types.h
rename arch/x86/include/asm/{xsave.h => fpu/xstate.h} (77%)
delete mode 100644 arch/x86/include/asm/i387.h
delete mode 100644 arch/x86/include/asm/xcr.h
create mode 100644 arch/x86/kernel/fpu/Makefile
create mode 100644 arch/x86/kernel/fpu/bugs.c
create mode 100644 arch/x86/kernel/fpu/core.c
create mode 100644 arch/x86/kernel/fpu/init.c
create mode 100644 arch/x86/kernel/fpu/measure.c
create mode 100644 arch/x86/kernel/fpu/regset.c
create mode 100644 arch/x86/kernel/fpu/signal.c
create mode 100644 arch/x86/kernel/fpu/xstate.c
delete mode 100644 arch/x86/kernel/i387.c
delete mode 100644 arch/x86/kernel/xsave.c

--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/