Linux 3.16.56

From: Ben Hutchings
Date: Mon Mar 19 2018 - 17:11:17 EST


I'm announcing the release of the 3.16.56 kernel.

All users of the 3.16 kernel series should upgrade.

The updated 3.16.y git tree can be found at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-3.16.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git

The diff from 3.16.55 is attached to this message.

Ben.

------------

Documentation/ABI/testing/sysfs-devices-system-cpu | 16 ++
Documentation/kernel-parameters.txt | 51 +++-
Documentation/speculation.txt | 90 +++++++
Documentation/x86/pti.txt | 186 +++++++++++++
Makefile | 2 +-
arch/x86/Kconfig | 14 +
arch/x86/Makefile | 8 +
arch/x86/crypto/aesni-intel_asm.S | 5 +-
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 +-
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 +-
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +-
arch/x86/ia32/ia32entry.S | 54 ++--
arch/x86/include/asm/alternative-asm.h | 14 +-
arch/x86/include/asm/alternative.h | 20 +-
arch/x86/include/asm/asm.h | 11 +
arch/x86/include/asm/barrier.h | 31 ++-
arch/x86/include/asm/cpufeature.h | 8 +
arch/x86/include/asm/intel-family.h | 68 +++++
arch/x86/include/asm/nospec-branch.h | 198 ++++++++++++++
arch/x86/include/asm/processor.h | 6 +-
arch/x86/include/asm/switch_to.h | 38 +++
arch/x86/include/asm/uaccess.h | 64 +++--
arch/x86/include/asm/uaccess_32.h | 24 ++
arch/x86/include/asm/uaccess_64.h | 94 +++++--
arch/x86/include/asm/xen/hypercall.h | 5 +-
arch/x86/include/uapi/asm/msr-index.h | 3 +
arch/x86/kernel/alternative.c | 29 +-
arch/x86/kernel/cpu/Makefile | 4 +-
arch/x86/kernel/cpu/amd.c | 28 +-
arch/x86/kernel/cpu/bugs.c | 299 ++++++++++++++++++++-
arch/x86/kernel/cpu/bugs_64.c | 33 ---
arch/x86/kernel/cpu/common.c | 32 ++-
arch/x86/kernel/cpu/microcode/intel.c | 2 +-
arch/x86/kernel/cpu/proc.c | 4 +-
arch/x86/kernel/entry_32.S | 25 +-
arch/x86/kernel/entry_64.S | 29 +-
arch/x86/kernel/irq_32.c | 16 +-
arch/x86/kernel/kprobes/opt.c | 23 +-
arch/x86/kernel/mcount_64.S | 8 +-
arch/x86/kernel/vmlinux.lds.S | 6 +
arch/x86/kvm/emulate.c | 9 +-
arch/x86/kvm/svm.c | 23 ++
arch/x86/kvm/vmx.c | 46 ++--
arch/x86/lib/Makefile | 2 +
arch/x86/lib/checksum_32.S | 7 +-
arch/x86/lib/getuser.S | 10 +
arch/x86/lib/retpoline-export.c | 24 ++
arch/x86/lib/retpoline.S | 47 ++++
arch/x86/lib/usercopy_32.c | 20 +-
drivers/base/Kconfig | 3 +
drivers/base/cpu.c | 48 ++++
drivers/hv/hv.c | 25 +-
include/linux/cpu.h | 7 +
include/linux/fdtable.h | 5 +-
include/linux/init.h | 9 +-
include/linux/kaiser.h | 2 +-
include/linux/kconfig.h | 9 +-
include/linux/module.h | 9 +
include/linux/nospec.h | 59 ++++
kernel/module.c | 11 +
net/wireless/nl80211.c | 9 +-
scripts/mod/modpost.c | 9 +
62 files changed, 1710 insertions(+), 240 deletions(-)

Andi Kleen (3):
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
module/retpoline: Warn about missing retpoline in module

Andrey Ryabinin (1):
x86/asm: Use register variable to get stack pointer value

Andy Lutomirski (3):
x86/cpu: Factor out application of forced CPU caps
x86/asm: Make asm/alternative.h safe from assembly
x86: Clean up current_stack_pointer

Arnd Bergmann (1):
x86: fix build warnign with 32-bit PAE

Ben Hutchings (2):
x86/syscall: Sanitize syscall table de-references under speculation
Linux 3.16.56

Borislav Petkov (6):
x86/cpu: Merge bugs.c and bugs_64.c
x86/alternatives: Guard NOPs optimization
x86/alternatives: Fix ALTERNATIVE_2 padding generation properly
x86/alternatives: Fix optimize_nops() checking
x86/nospec: Fix header guards names
x86/bugs: Drop one "mitigation" from dmesg

Colin Ian King (1):
x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"

Dan Carpenter (1):
x86/spectre: Fix an error message

Dan Williams (13):
array_index_nospec: Sanitize speculative array de-references
x86: Implement array_index_mask_nospec
x86: Introduce barrier_nospec
x86/get_user: Use pointer masking to limit speculation
vfs, fdtable: Prevent bounds-check bypass via speculative execution
nl80211: Sanitize array index in parse_txq_params
x86/spectre: Report get_user mitigation for spectre_v1
x86/kvm: Update spectre-v1 mitigation
nospec: Kill array_index_nospec_mask_check()
nospec: Include <asm/barrier.h> dependency
x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}
x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec

Darren Kenny (1):
x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL

Dave Hansen (2):
x86/Documentation: Add PTI description
x86/cpu/intel: Introduce macros for Intel family numbers

David Woodhouse (14):
x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
sysfs/cpu: Fix typos in vulnerability documentation
x86/retpoline: Add initial retpoline support
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/retpoline: Avoid retpolines for built-in __init functions
x86/cpufeatures: Clean up Spectre v2 related CPUID flags

Dou Liyang (1):
x86/spectre: Check CONFIG_RETPOLINE in command line parser

Gustavo A. R. Silva (1):
x86/cpu: Change type of x86_cache_size variable to unsigned int

Jim Mattson (1):
kvm: vmx: Scrub hardware GPRs at VM-exit

Josh Poimboeuf (1):
x86/paravirt: Remove 'noreplace-paravirt' cmdline option

KarimAllah Ahmed (1):
x86/spectre: Simplify spectre_v2 command line parsing

Linus Torvalds (2):
x86: reorganize SMAP handling in user space accesses
x86: fix SMAP in 32-bit environments

Mark Rutland (1):
Documentation: Document array_index_nospec

Masahiro Yamada (1):
kconfig.h: use __is_defined() to check if MODULE is defined

Masami Hiramatsu (3):
retpoline: Introduce start/end markers of indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
kprobes/x86: Disable optimizing on the function jumps to indirect thunk

Peter Zijlstra (2):
KVM: x86: Make indirect calls in emulator speculation safe
KVM: VMX: Make indirect call speculation safe

Thomas Gleixner (8):
x86/cpufeatures: Make CPU bugs sticky
x86/cpufeatures: Add X86_BUG_CPU_INSECURE
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
sysfs/cpu: Add vulnerability folder
x86/cpu: Implement CPU vulnerabilites sysfs functions
x86/alternatives: Make optimize_nops() interrupt safe and synced
x86/retpoline: Remove compile time warning
x86/cpu/bugs: Make retpoline module warning conditional

Tom Lendacky (4):
x86/cpu, x86/pti: Do not enable PTI on AMD processors
x86/cpu/AMD: Make LFENCE a serializing instruction
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros

Waiman Long (1):
x86/retpoline: Remove the esp/rsp thunk

Will Deacon (1):
nospec: Move array_index_nospec() parameter checking into separate macro

zhenwei.pi (1):
x86/pti: Document fix wrong index

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index acb9bfc89b48..49216c96236b 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -224,3 +224,19 @@ Description: Parameters for the Intel P-state driver
frequency range.

More details can be found in Documentation/cpu-freq/intel-pstate.txt
+
+What: /sys/devices/system/cpu/vulnerabilities
+ /sys/devices/system/cpu/vulnerabilities/meltdown
+ /sys/devices/system/cpu/vulnerabilities/spectre_v1
+ /sys/devices/system/cpu/vulnerabilities/spectre_v2
+Date: January 2018
+Contact: Linux kernel mailing list <linux-kernel@xxxxxxxxxxxxxxx>
+Description: Information about CPU vulnerabilities
+
+ The files are named after the code names of CPU
+ vulnerabilities. The output of those files reflects the
+ state of the CPUs in the system. Possible output values:
+
+ "Not affected" CPU is not affected by the vulnerability
+ "Vulnerable" CPU is affected and no mitigation in effect
+ "Mitigation: $M" CPU is affected and mitigation $M is in effect
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index a0fd7c8052a1..9d10847d1998 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2167,6 +2167,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
register save and restore. The kernel will only save
legacy floating-point registers on task switch.

+ nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
+ (indirect branch prediction) vulnerability. System may
+ allow data leaks with this option, which is equivalent
+ to spectre_v2=off.
+
noxsave [BUGS=X86] Disables x86 extended register state save
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
@@ -2229,8 +2234,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.

nojitter [IA-64] Disables jitter checking for ITC timers.

- nopti [X86-64] Disable KAISER isolation of kernel from user.
-
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver

no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
@@ -2268,8 +2271,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
norandmaps Don't use address space randomization. Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space

- noreplace-paravirt [X86,IA-64,PV_OPS] Don't patch paravirt_ops
-
noreplace-smp [X86-32,SMP] Don't replace SMP instructions
with UP alternatives

@@ -2752,11 +2753,20 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
pt. [PARIDE]
See Documentation/blockdev/paride.txt.

- pti= [X86_64]
- Control KAISER user/kernel address space isolation:
- on - enable
- off - disable
- auto - default setting
+ pti= [X86_64] Control Page Table Isolation of user and
+ kernel address spaces. Disabling this feature
+ removes hardening, but improves performance of
+ system calls and interrupts.
+
+ on - unconditionally enable
+ off - unconditionally disable
+ auto - kernel detects whether your CPU model is
+ vulnerable to issues that PTI mitigates
+
+ Not specifying this option is equivalent to pti=auto.
+
+ nopti [X86_64]
+ Equivalent to pti=off

pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
@@ -3161,6 +3171,29 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/laptops/sonypi.txt

+ spectre_v2= [X86] Control mitigation of Spectre variant 2
+ (indirect branch speculation) vulnerability.
+
+ on - unconditionally enable
+ off - unconditionally disable
+ auto - kernel detects whether your CPU model is
+ vulnerable
+
+ Selecting 'on' will, and 'auto' may, choose a
+ mitigation method at run time according to the
+ CPU, the available microcode, the setting of the
+ CONFIG_RETPOLINE configuration option, and the
+ compiler with which the kernel was built.
+
+ Specific mitigations can also be selected manually:
+
+ retpoline - replace indirect branches
+ retpoline,generic - google's original retpoline
+ retpoline,amd - AMD-specific minimal thunk
+
+ Not specifying this option is equivalent to
+ spectre_v2=auto.
+
spia_io_base= [HW,MTD]
spia_fio_base=
spia_pedr=
diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..e9e6cbae2841
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,90 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+ int load_array(int *array, unsigned int index)
+ {
+ if (index >= MAX_ARRAY_ELEMS)
+ return 0;
+ else
+ return array[index];
+ }
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+ CMP <index>, #MAX_ARRAY_ELEMS
+ B.LT less
+ MOV <returnval>, #0
+ RET
+ less:
+ LDR <returnval>, [<array>, <index>]
+ RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This
+value will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may
+result in sensitive information being leaked. Consider the following
+code, building on the prior example:
+
+ int load_dependent_arrays(int *arr1, int *arr2, int index)
+ {
+ int val1, val2,
+
+ val1 = load_array(arr1, index);
+ val2 = load_array(arr2, val1);
+
+ return val2;
+ }
+
+Under speculation, the first call to load_array() may return the value
+of an out-of-bounds address, while the second call will influence
+microarchitectural state dependent on this value. This may provide an
+arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are
+respected even under speculation. Architectures which are affected by
+speculation-based side-channels are expected to implement these
+primitives.
+
+The array_index_nospec() helper in <linux/nospec.h> can be used to
+prevent information from being leaked via side-channels.
+
+A call to array_index_nospec(index, size) returns a sanitized index
+value that is bounded to [0, size) even under cpu speculation
+conditions.
+
+This can be used to protect the earlier load_array() example:
+
+ int load_array(int *array, unsigned int index)
+ {
+ if (index >= MAX_ARRAY_ELEMS)
+ return 0;
+ else {
+ index = array_index_nospec(index, MAX_ARRAY_ELEMS);
+ return array[index];
+ }
+ }
diff --git a/Documentation/x86/pti.txt b/Documentation/x86/pti.txt
new file mode 100644
index 000000000000..5cd58439ad2d
--- /dev/null
+++ b/Documentation/x86/pti.txt
@@ -0,0 +1,186 @@
+Overview
+========
+
+Page Table Isolation (pti, previously known as KAISER[1]) is a
+countermeasure against attacks on the shared user/kernel address
+space such as the "Meltdown" approach[2].
+
+To mitigate this class of attacks, we create an independent set of
+page tables for use only when running userspace applications. When
+the kernel is entered via syscalls, interrupts or exceptions, the
+page tables are switched to the full "kernel" copy. When the system
+switches back to user mode, the user copy is used again.
+
+The userspace page tables contain only a minimal amount of kernel
+data: only what is needed to enter/exit the kernel such as the
+entry/exit functions themselves and the interrupt descriptor table
+(IDT). There are a few strictly unnecessary things that get mapped
+such as the first C function when entering an interrupt (see
+comments in pti.c).
+
+This approach helps to ensure that side-channel attacks leveraging
+the paging structures do not function when PTI is enabled. It can be
+enabled by setting CONFIG_PAGE_TABLE_ISOLATION=y at compile time.
+Once enabled at compile-time, it can be disabled at boot with the
+'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt).
+
+Page Table Management
+=====================
+
+When PTI is enabled, the kernel manages two sets of page tables.
+The first set is very similar to the single set which is present in
+kernels without PTI. This includes a complete mapping of userspace
+that the kernel can use for things like copy_to_user().
+
+Although _complete_, the user portion of the kernel page tables is
+crippled by setting the NX bit in the top level. This ensures
+that any missed kernel->user CR3 switch will immediately crash
+userspace upon executing its first instruction.
+
+The userspace page tables map only the kernel data needed to enter
+and exit the kernel. This data is entirely contained in the 'struct
+cpu_entry_area' structure which is placed in the fixmap which gives
+each CPU's copy of the area a compile-time-fixed virtual address.
+
+For new userspace mappings, the kernel makes the entries in its
+page tables like normal. The only difference is when the kernel
+makes entries in the top (PGD) level. In addition to setting the
+entry in the main kernel PGD, a copy of the entry is made in the
+userspace page tables' PGD.
+
+This sharing at the PGD level also inherently shares all the lower
+layers of the page tables. This leaves a single, shared set of
+userspace page tables to manage. One PTE to lock, one set of
+accessed bits, dirty bits, etc...
+
+Overhead
+========
+
+Protection against side-channel attacks is important. But,
+this protection comes at a cost:
+
+1. Increased Memory Use
+ a. Each process now needs an order-1 PGD instead of order-0.
+ (Consumes an additional 4k per process).
+ b. The 'cpu_entry_area' structure must be 2MB in size and 2MB
+ aligned so that it can be mapped by setting a single PMD
+ entry. This consumes nearly 2MB of RAM once the kernel
+ is decompressed, but no space in the kernel image itself.
+
+2. Runtime Cost
+ a. CR3 manipulation to switch between the page table copies
+ must be done at interrupt, syscall, and exception entry
+ and exit (it can be skipped when the kernel is interrupted,
+ though.) Moves to CR3 are on the order of a hundred
+ cycles, and are required at every entry and exit.
+ b. A "trampoline" must be used for SYSCALL entry. This
+ trampoline depends on a smaller set of resources than the
+ non-PTI SYSCALL entry code, so requires mapping fewer
+ things into the userspace page tables. The downside is
+ that stacks must be switched at entry time.
+ c. Global pages are disabled for all kernel structures not
+ mapped into both kernel and userspace page tables. This
+ feature of the MMU allows different processes to share TLB
+ entries mapping the kernel. Losing the feature means more
+ TLB misses after a context switch. The actual loss of
+ performance is very small, however, never exceeding 1%.
+ d. Process Context IDentifiers (PCID) is a CPU feature that
+ allows us to skip flushing the entire TLB when switching page
+ tables by setting a special bit in CR3 when the page tables
+ are changed. This makes switching the page tables (at context
+ switch, or kernel entry/exit) cheaper. But, on systems with
+ PCID support, the context switch code must flush both the user
+ and kernel entries out of the TLB. The user PCID TLB flush is
+ deferred until the exit to userspace, minimizing the cost.
+ See intel.com/sdm for the gory PCID/INVPCID details.
+ e. The userspace page tables must be populated for each new
+ process. Even without PTI, the shared kernel mappings
+ are created by copying top-level (PGD) entries into each
+ new process. But, with PTI, there are now *two* kernel
+ mappings: one in the kernel page tables that maps everything
+ and one for the entry/exit structures. At fork(), we need to
+ copy both.
+ f. In addition to the fork()-time copying, there must also
+ be an update to the userspace PGD any time a set_pgd() is done
+ on a PGD used to map userspace. This ensures that the kernel
+ and userspace copies always map the same userspace
+ memory.
+ g. On systems without PCID support, each CR3 write flushes
+ the entire TLB. That means that each syscall, interrupt
+ or exception flushes the TLB.
+ h. INVPCID is a TLB-flushing instruction which allows flushing
+ of TLB entries for non-current PCIDs. Some systems support
+ PCIDs, but do not support INVPCID. On these systems, addresses
+ can only be flushed from the TLB for the current PCID. When
+ flushing a kernel address, we need to flush all PCIDs, so a
+ single kernel address flush will require a TLB-flushing CR3
+ write upon the next use of every PCID.
+
+Possible Future Work
+====================
+1. We can be more careful about not actually writing to CR3
+ unless its value is actually changed.
+2. Allow PTI to be enabled/disabled at runtime in addition to the
+ boot-time switching.
+
+Testing
+========
+
+To test stability of PTI, the following test procedure is recommended,
+ideally doing all of these in parallel:
+
+1. Set CONFIG_DEBUG_ENTRY=y
+2. Run several copies of all of the tools/testing/selftests/x86/ tests
+ (excluding MPX and protection_keys) in a loop on multiple CPUs for
+ several minutes. These tests frequently uncover corner cases in the
+ kernel entry code. In general, old kernels might cause these tests
+ themselves to crash, but they should never crash the kernel.
+3. Run the 'perf' tool in a mode (top or record) that generates many
+ frequent performance monitoring non-maskable interrupts (see "NMI"
+ in /proc/interrupts). This exercises the NMI entry/exit code which
+ is known to trigger bugs in code paths that did not expect to be
+ interrupted, including nested NMIs. Using "-c" boosts the rate of
+ NMIs, and using two -c with separate counters encourages nested NMIs
+ and less deterministic behavior.
+
+ while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done
+
+4. Launch a KVM virtual machine.
+5. Run 32-bit binaries on systems supporting the SYSCALL instruction.
+ This has been a lightly-tested code path and needs extra scrutiny.
+
+Debugging
+=========
+
+Bugs in PTI cause a few different signatures of crashes
+that are worth noting here.
+
+ * Failures of the selftests/x86 code. Usually a bug in one of the
+ more obscure corners of entry_64.S
+ * Crashes in early boot, especially around CPU bringup. Bugs
+ in the trampoline code or mappings cause these.
+ * Crashes at the first interrupt. Caused by bugs in entry_64.S,
+ like screwing up a page table switch. Also caused by
+ incorrectly mapping the IRQ handler entry code.
+ * Crashes at the first NMI. The NMI code is separate from main
+ interrupt handlers and can have bugs that do not affect
+ normal interrupts. Also caused by incorrectly mapping NMI
+ code. NMIs that interrupt the entry code must be very
+ careful and can be the cause of crashes that show up when
+ running perf.
+ * Kernel crashes at the first exit to userspace. entry_64.S
+ bugs, or failing to map some of the exit code.
+ * Crashes at first interrupt that interrupts userspace. The paths
+ in entry_64.S that return to userspace are sometimes separate
+ from the ones that return to the kernel.
+ * Double faults: overflowing the kernel stack because of page
+ faults upon page faults. Caused by touching non-pti-mapped
+ data in the entry code, or forgetting to switch to kernel
+ CR3 before calling into C functions which are not pti-mapped.
+ * Userspace segfaults early in boot, sometimes manifesting
+ as mount(8) failing to mount the rootfs. These have
+ tended to be TLB invalidation issues. Usually invalidating
+ the wrong PCID, or otherwise missing an invalidation.
+
+1. https://gruss.cc/files/kaiser.pdf
+2. https://meltdownattack.com/meltdown.pdf
diff --git a/Makefile b/Makefile
index 008d8a02246b..1c632c11f398 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
VERSION = 3
PATCHLEVEL = 16
-SUBLEVEL = 55
+SUBLEVEL = 56
EXTRAVERSION =
NAME = Museum of Fishiegoodies

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ea6a3fb6a411..22512bbc5a41 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -130,6 +130,7 @@ config X86
select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
select HAVE_CC_STACKPROTECTOR
select GENERIC_CPU_AUTOPROBE
+ select GENERIC_CPU_VULNERABILITIES
select HAVE_ARCH_AUDITSYSCALL
select ARCH_SUPPORTS_ATOMIC_RMW

@@ -338,6 +339,19 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH

+config RETPOLINE
+ bool "Avoid speculative indirect branches in kernel"
+ default y
+ help
+ Compile kernel with the retpoline compiler options to guard against
+ kernel-to-user data leaks by avoiding speculative indirect
+ branches. Requires a compiler with -mindirect-branch=thunk-extern
+ support for full protection. The kernel may run slower.
+
+ Without compiler support, at least indirect branches in assembler
+ code are eliminated. Since this includes the syscall entry path,
+ it is not entirely pointless.
+
if X86_32
config X86_EXTENDED_PLATFORM
bool "Support for extended (non-PC) x86 platforms"
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 33f71b01fd22..ed0a0ef12966 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -177,6 +177,14 @@ KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
KBUILD_CFLAGS += $(mflags-y)
KBUILD_AFLAGS += $(mflags-y)

+# Avoid indirect branches in kernel to deal with Spectre
+ifdef CONFIG_RETPOLINE
+ RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
+ ifneq ($(RETPOLINE_CFLAGS),)
+ KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
+ endif
+endif
+
archscripts: scripts_basic
$(Q)$(MAKE) $(build)=arch/x86/tools relocs

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 477e9d75149b..6e05cad3cc43 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -31,6 +31,7 @@

#include <linux/linkage.h>
#include <asm/inst.h>
+#include <asm/nospec-branch.h>

#ifdef __x86_64__
.data
@@ -2703,7 +2704,7 @@ ENTRY(aesni_xts_crypt8)
pxor INC, STATE4
movdqu IV, 0x30(OUTP)

- call *%r11
+ CALL_NOSPEC %r11

movdqu 0x00(OUTP), INC
pxor INC, STATE1
@@ -2748,7 +2749,7 @@ ENTRY(aesni_xts_crypt8)
_aesni_gf128mul_x_ble()
movups IV, (IVP)

- call *%r11
+ CALL_NOSPEC %r11

movdqu 0x40(OUTP), INC
pxor INC, STATE1
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index ce71f9212409..5881756f78a2 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -16,6 +16,7 @@
*/

#include <linux/linkage.h>
+#include <asm/nospec-branch.h>

#define CAMELLIA_TABLE_BYTE_LEN 272

@@ -1210,7 +1211,7 @@ ENDPROC(camellia_ctr_16way)
vpxor 14 * 16(%rax), %xmm15, %xmm14;
vpxor 15 * 16(%rax), %xmm15, %xmm15;

- call *%r9;
+ CALL_NOSPEC %r9;

addq $(16 * 16), %rsp;

diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index 0e0b8863a34b..0d45b04b490a 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -11,6 +11,7 @@
*/

#include <linux/linkage.h>
+#include <asm/nospec-branch.h>

#define CAMELLIA_TABLE_BYTE_LEN 272

@@ -1323,7 +1324,7 @@ ENDPROC(camellia_ctr_32way)
vpxor 14 * 32(%rax), %ymm15, %ymm14;
vpxor 15 * 32(%rax), %ymm15, %ymm15;

- call *%r9;
+ CALL_NOSPEC %r9;

addq $(16 * 32), %rsp;

diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index dbc4339b5417..e78bd1c87037 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -45,6 +45,7 @@

#include <asm/inst.h>
#include <linux/linkage.h>
+#include <asm/nospec-branch.h>

## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction

@@ -171,7 +172,7 @@ ENTRY(crc_pcl)
movzxw (bufp, %rax, 2), len
offset=crc_array-jump_table
lea offset(bufp, len, 1), bufp
- jmp *bufp
+ JMP_NOSPEC bufp

################################################################
## 2a) PROCESS FULL BLOCKS:
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 88ceb07b5a2b..6f98ae2646cf 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -19,6 +19,7 @@
#include <asm/kaiser.h>
#include <linux/linkage.h>
#include <linux/err.h>
+#include <asm/nospec-branch.h>

/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
#include <linux/elf-em.h>
@@ -168,12 +169,19 @@ ENTRY(ia32_sysenter_target)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
CFI_REMEMBER_STATE
jnz sysenter_tracesys
- cmpq $(IA32_NR_syscalls-1),%rax
- ja ia32_badsys
+ cmpq $(IA32_NR_syscalls),%rax
+ jae ia32_badsys
sysenter_do_call:
+ sbb %r8,%r8 /* array_index_mask_nospec() */
+ and %r8,%rax
IA32_ARG_FIXUP
sysenter_dispatch:
+#ifdef CONFIG_RETPOLINE
+ movq ia32_sys_call_table(,%rax,8),%rax
+ call __x86_indirect_thunk_rax
+#else
call *ia32_sys_call_table(,%rax,8)
+#endif
movq %rax,RAX-ARGOFFSET(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
@@ -210,8 +218,10 @@ ENTRY(ia32_sysenter_target)
movl $AUDIT_ARCH_I386,%edi /* 1st arg: audit arch */
call __audit_syscall_entry
movl RAX-ARGOFFSET(%rsp),%eax /* reload syscall number */
- cmpq $(IA32_NR_syscalls-1),%rax
- ja ia32_badsys
+ cmpq $(IA32_NR_syscalls),%rax
+ jae ia32_badsys
+ sbb %r8,%r8 /* array_index_mask_nospec() */
+ and %r8,%rax
movl %ebx,%edi /* reload 1st syscall arg */
movl RCX-ARGOFFSET(%rsp),%esi /* reload 2nd syscall arg */
movl RDX-ARGOFFSET(%rsp),%edx /* reload 3rd syscall arg */
@@ -267,8 +277,8 @@ ENTRY(ia32_sysenter_target)
call syscall_trace_enter
LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
RESTORE_REST
- cmpq $(IA32_NR_syscalls-1),%rax
- ja int_ret_from_sys_call /* sysenter_tracesys has set RAX(%rsp) */
+ cmpq $(IA32_NR_syscalls),%rax
+ jae int_ret_from_sys_call /* sysenter_tracesys has set RAX(%rsp) */
jmp sysenter_do_call
CFI_ENDPROC
ENDPROC(ia32_sysenter_target)
@@ -333,12 +343,19 @@ ENTRY(ia32_cstar_target)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
CFI_REMEMBER_STATE
jnz cstar_tracesys
- cmpq $IA32_NR_syscalls-1,%rax
- ja ia32_badsys
+ cmpq $IA32_NR_syscalls,%rax
+ jae ia32_badsys
cstar_do_call:
+ sbb %r8,%r8 /* array_index_mask_nospec() */
+ and %r8,%rax
IA32_ARG_FIXUP 1
cstar_dispatch:
+#ifdef CONFIG_RETPOLINE
+ movq ia32_sys_call_table(,%rax,8),%rax
+ call __x86_indirect_thunk_rax
+#else
call *ia32_sys_call_table(,%rax,8)
+#endif
movq %rax,RAX-ARGOFFSET(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
@@ -386,8 +403,8 @@ ENTRY(ia32_cstar_target)
LOAD_ARGS32 ARGOFFSET, 1 /* reload args from stack in case ptrace changed it */
RESTORE_REST
xchgl %ebp,%r9d
- cmpq $(IA32_NR_syscalls-1),%rax
- ja int_ret_from_sys_call /* cstar_tracesys has set RAX(%rsp) */
+ cmpq $(IA32_NR_syscalls),%rax
+ jae int_ret_from_sys_call /* cstar_tracesys has set RAX(%rsp) */
jmp cstar_do_call
END(ia32_cstar_target)

@@ -445,11 +462,18 @@ ENTRY(ia32_syscall)
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
jnz ia32_tracesys
- cmpq $(IA32_NR_syscalls-1),%rax
- ja ia32_badsys
+ cmpq $(IA32_NR_syscalls),%rax
+ jae ia32_badsys
ia32_do_call:
+ sbb %r8,%r8 /* array_index_mask_nospec() */
+ and %r8,%rax
IA32_ARG_FIXUP
+#ifdef CONFIG_RETPOLINE
+ movq ia32_sys_call_table(,%rax,8),%rax
+ call __x86_indirect_thunk_rax
+#else
call *ia32_sys_call_table(,%rax,8) # xxx: rip relative
+#endif
ia32_sysret:
movq %rax,RAX-ARGOFFSET(%rsp)
ia32_ret_from_sys_call:
@@ -464,8 +488,8 @@ ia32_tracesys:
call syscall_trace_enter
LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
RESTORE_REST
- cmpq $(IA32_NR_syscalls-1),%rax
- ja int_ret_from_sys_call /* ia32_tracesys has set RAX(%rsp) */
+ cmpq $(IA32_NR_syscalls),%rax
+ jae int_ret_from_sys_call /* ia32_tracesys has set RAX(%rsp) */
jmp ia32_do_call
END(ia32_syscall)

@@ -515,7 +539,7 @@ GLOBAL(stub32_clone)
CFI_REL_OFFSET rsp,RSP-ARGOFFSET
/* CFI_REL_OFFSET ss,SS-ARGOFFSET*/
SAVE_REST
- call *%rax
+ CALL_NOSPEC %rax
RESTORE_REST
jmp ia32_sysret /* misbalances the return cache */
CFI_ENDPROC
diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index 524bddce0b76..bdf02eeee765 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -45,12 +45,22 @@
.popsection
.endm

+#define old_len 141b-140b
+#define new_len1 144f-143f
+#define new_len2 145f-144f
+
+/*
+ * max without conditionals. Idea adapted from:
+ * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
+ */
+#define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))
+
.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2
140:
\oldinstr
141:
- .skip -(((144f-143f)-(141b-140b)) > 0) * ((144f-143f)-(141b-140b)),0x90
- .skip -(((145f-144f)-(144f-143f)-(141b-140b)) > 0) * ((145f-144f)-(144f-143f)-(141b-140b)),0x90
+ .skip -((alt_max_short(new_len1, new_len2) - (old_len)) > 0) * \
+ (alt_max_short(new_len1, new_len2) - (old_len)),0x90
142:

.pushsection .altinstructions,"a"
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 094ff9b3c80a..a455de2f9aaf 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -1,6 +1,8 @@
#ifndef _ASM_X86_ALTERNATIVE_H
#define _ASM_X86_ALTERNATIVE_H

+#ifndef __ASSEMBLY__
+
#include <linux/types.h>
#include <linux/stddef.h>
#include <linux/stringify.h>
@@ -95,14 +97,22 @@ static inline int alternatives_text_reserved(void *start, void *end)
__OLDINSTR(oldinstr, num) \
alt_end_marker ":\n"

+/*
+ * max without conditionals. Idea adapted from:
+ * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
+ *
+ * The additional "-" is needed because gas works with s32s.
+ */
+#define alt_max_short(a, b) "((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")))))"
+
/*
* Pad the second replacement alternative with additional NOPs if it is
* additionally longer than the first replacement alternative.
*/
-#define OLDINSTR_2(oldinstr, num1, num2) \
- __OLDINSTR(oldinstr, num1) \
- ".skip -(((" alt_rlen(num2) ")-(" alt_rlen(num1) ")-(662b-661b)) > 0) * " \
- "((" alt_rlen(num2) ")-(" alt_rlen(num1) ")-(662b-661b)),0x90\n" \
+#define OLDINSTR_2(oldinstr, num1, num2) \
+ "661:\n\t" oldinstr "\n662:\n" \
+ ".skip -((" alt_max_short(alt_rlen(num1), alt_rlen(num2)) " - (" alt_slen ")) > 0) * " \
+ "(" alt_max_short(alt_rlen(num1), alt_rlen(num2)) " - (" alt_slen ")), 0x90\n" \
alt_end_marker ":\n"

#define ALTINSTR_ENTRY(feature, num) \
@@ -243,4 +253,6 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);

+#endif /* __ASSEMBLY__ */
+
#endif /* _ASM_X86_ALTERNATIVE_H */
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 7730c1c5c83a..5a5852de05f9 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -80,4 +80,15 @@
/* For C file, we already have NOKPROBE_SYMBOL macro */
#endif

+#ifndef __ASSEMBLY__
+/*
+ * This output constraint should be used for any inline asm which has a "call"
+ * instruction. Otherwise the asm may be inserted before the frame pointer
+ * gets set up by the containing function. If you forget to do this, objtool
+ * may print a "call without frame pointer save/setup" warning.
+ */
+register unsigned long current_stack_pointer asm(_ASM_SP);
+#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
+#endif
+
#endif /* _ASM_X86_ASM_H */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 5c7198cca5ed..beeee88510d4 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,34 @@
#define wmb() asm volatile("sfence" ::: "memory")
#endif

+/**
+ * array_index_mask_nospec() - generate a mask that is ~0UL when the
+ * bounds check succeeds and 0 otherwise
+ * @index: array element index
+ * @size: number of elements in array
+ *
+ * Returns:
+ * 0 - (index < size)
+ */
+static inline unsigned long array_index_mask_nospec(unsigned long index,
+ unsigned long size)
+{
+ unsigned long mask;
+
+ asm ("cmp %1,%2; sbb %0,%0;"
+ :"=r" (mask)
+ :"r"(size),"r" (index)
+ :"cc");
+ return mask;
+}
+
+/* Override the default implementation from linux/nospec.h. */
+#define array_index_mask_nospec array_index_mask_nospec
+
+/* Prevent speculative execution past this barrier. */
+#define barrier_nospec() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+ "lfence", X86_FEATURE_LFENCE_RDTSC)
+
/**
* read_barrier_depends - Flush all pending reads that subsequents reads
* depend on.
@@ -150,8 +178,7 @@ do { \
*/
static __always_inline void rdtsc_barrier(void)
{
- alternative(ASM_NOP3, "mfence", X86_FEATURE_MFENCE_RDTSC);
- alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
+ barrier_nospec();
}

#endif /* _ASM_X86_BARRIER_H */
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 6b8fc931a973..8722c8f7405d 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -187,7 +187,10 @@
#define X86_FEATURE_HW_PSTATE (7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK (7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_INVPCID_SINGLE (7*32+10) /* Effectively INVPCID && CR4.PCIDE=1 */
+#define X86_FEATURE_RSB_CTXSW (7*32+11) /* "" Fill RSB on context switches */

+#define X86_FEATURE_RETPOLINE (7*32+29) /* "" Generic Retpoline mitigation for Spectre variant 2 */
+#define X86_FEATURE_RETPOLINE_AMD (7*32+30) /* "" AMD Retpoline mitigation for Spectre variant 2 */
/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
#define X86_FEATURE_KAISER (7*32+31) /* "" CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */

@@ -241,6 +244,9 @@
#define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */
#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* AMD Erratum 383 */
#define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* AMD Erratum 400 */
+#define X86_BUG_CPU_MELTDOWN X86_BUG(5) /* CPU is affected by meltdown attack and needs kernel page table isolation */
+#define X86_BUG_SPECTRE_V1 X86_BUG(6) /* CPU is affected by Spectre variant 1 attack with conditional branches */
+#define X86_BUG_SPECTRE_V2 X86_BUG(7) /* CPU is affected by Spectre variant 2 attack with indirect branches */

#if defined(__KERNEL__) && !defined(__ASSEMBLY__)

@@ -286,6 +292,8 @@ extern const char * const x86_power_flags[32];
set_bit(bit, (unsigned long *)cpu_caps_set); \
} while (0)

+#define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit)
+
#define cpu_has_fpu boot_cpu_has(X86_FEATURE_FPU)
#define cpu_has_vme boot_cpu_has(X86_FEATURE_VME)
#define cpu_has_de boot_cpu_has(X86_FEATURE_DE)
diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h
new file mode 100644
index 000000000000..6999f7d01a0d
--- /dev/null
+++ b/arch/x86/include/asm/intel-family.h
@@ -0,0 +1,68 @@
+#ifndef _ASM_X86_INTEL_FAMILY_H
+#define _ASM_X86_INTEL_FAMILY_H
+
+/*
+ * "Big Core" Processors (Branded as Core, Xeon, etc...)
+ *
+ * The "_X" parts are generally the EP and EX Xeons, or the
+ * "Extreme" ones, like Broadwell-E.
+ *
+ * Things ending in "2" are usually because we have no better
+ * name for them. There's no processor called "WESTMERE2".
+ */
+
+#define INTEL_FAM6_CORE_YONAH 0x0E
+#define INTEL_FAM6_CORE2_MEROM 0x0F
+#define INTEL_FAM6_CORE2_MEROM_L 0x16
+#define INTEL_FAM6_CORE2_PENRYN 0x17
+#define INTEL_FAM6_CORE2_DUNNINGTON 0x1D
+
+#define INTEL_FAM6_NEHALEM 0x1E
+#define INTEL_FAM6_NEHALEM_EP 0x1A
+#define INTEL_FAM6_NEHALEM_EX 0x2E
+#define INTEL_FAM6_WESTMERE 0x25
+#define INTEL_FAM6_WESTMERE2 0x1F
+#define INTEL_FAM6_WESTMERE_EP 0x2C
+#define INTEL_FAM6_WESTMERE_EX 0x2F
+
+#define INTEL_FAM6_SANDYBRIDGE 0x2A
+#define INTEL_FAM6_SANDYBRIDGE_X 0x2D
+#define INTEL_FAM6_IVYBRIDGE 0x3A
+#define INTEL_FAM6_IVYBRIDGE_X 0x3E
+
+#define INTEL_FAM6_HASWELL_CORE 0x3C
+#define INTEL_FAM6_HASWELL_X 0x3F
+#define INTEL_FAM6_HASWELL_ULT 0x45
+#define INTEL_FAM6_HASWELL_GT3E 0x46
+
+#define INTEL_FAM6_BROADWELL_CORE 0x3D
+#define INTEL_FAM6_BROADWELL_XEON_D 0x56
+#define INTEL_FAM6_BROADWELL_GT3E 0x47
+#define INTEL_FAM6_BROADWELL_X 0x4F
+
+#define INTEL_FAM6_SKYLAKE_MOBILE 0x4E
+#define INTEL_FAM6_SKYLAKE_DESKTOP 0x5E
+#define INTEL_FAM6_SKYLAKE_X 0x55
+#define INTEL_FAM6_KABYLAKE_MOBILE 0x8E
+#define INTEL_FAM6_KABYLAKE_DESKTOP 0x9E
+
+/* "Small Core" Processors (Atom) */
+
+#define INTEL_FAM6_ATOM_PINEVIEW 0x1C
+#define INTEL_FAM6_ATOM_LINCROFT 0x26
+#define INTEL_FAM6_ATOM_PENWELL 0x27
+#define INTEL_FAM6_ATOM_CLOVERVIEW 0x35
+#define INTEL_FAM6_ATOM_CEDARVIEW 0x36
+#define INTEL_FAM6_ATOM_SILVERMONT1 0x37 /* BayTrail/BYT / Valleyview */
+#define INTEL_FAM6_ATOM_SILVERMONT2 0x4D /* Avaton/Rangely */
+#define INTEL_FAM6_ATOM_AIRMONT 0x4C /* CherryTrail / Braswell */
+#define INTEL_FAM6_ATOM_MERRIFIELD1 0x4A /* Tangier */
+#define INTEL_FAM6_ATOM_MERRIFIELD2 0x5A /* Annidale */
+#define INTEL_FAM6_ATOM_GOLDMONT 0x5C
+#define INTEL_FAM6_ATOM_DENVERTON 0x5F /* Goldmont Microserver */
+
+/* Xeon Phi */
+
+#define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */
+
+#endif /* _ASM_X86_INTEL_FAMILY_H */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
new file mode 100644
index 000000000000..66094a0473a8
--- /dev/null
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_X86_NOSPEC_BRANCH_H_
+#define _ASM_X86_NOSPEC_BRANCH_H_
+
+#include <asm/alternative.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
+
+/*
+ * Fill the CPU return stack buffer.
+ *
+ * Each entry in the RSB, if used for a speculative 'ret', contains an
+ * infinite 'pause; lfence; jmp' loop to capture speculative execution.
+ *
+ * This is required in various cases for retpoline and IBRS-based
+ * mitigations for the Spectre variant 2 vulnerability. Sometimes to
+ * eliminate potentially bogus entries from the RSB, and sometimes
+ * purely to ensure that it doesn't get empty, which on some CPUs would
+ * allow predictions from other (unwanted!) sources to be used.
+ *
+ * We define a CPP macro such that it can be used from both .S files and
+ * inline assembly. It's possible to do a .macro and then include that
+ * from C via asm(".include <asm/nospec-branch.h>") but let's not go there.
+ */
+
+#define RSB_CLEAR_LOOPS 32 /* To forcibly overwrite all entries */
+#define RSB_FILL_LOOPS 16 /* To avoid underflow */
+
+/*
+ * Google experimented with loop-unrolling and this turned out to be
+ * the optimal version â two calls, each with their own speculation
+ * trap should their return address end up getting used, in a loop.
+ */
+#define __FILL_RETURN_BUFFER(reg, nr, sp) \
+ mov $(nr/2), reg; \
+771: \
+ call 772f; \
+773: /* speculation trap */ \
+ pause; \
+ lfence; \
+ jmp 773b; \
+772: \
+ call 774f; \
+775: /* speculation trap */ \
+ pause; \
+ lfence; \
+ jmp 775b; \
+774: \
+ dec reg; \
+ jnz 771b; \
+ add $(BITS_PER_LONG/8) * nr, sp;
+
+#ifdef __ASSEMBLY__
+
+/*
+ * These are the bare retpoline primitives for indirect jmp and call.
+ * Do not use these directly; they only exist to make the ALTERNATIVE
+ * invocation below less ugly.
+ */
+.macro RETPOLINE_JMP reg:req
+ call .Ldo_rop_\@
+.Lspec_trap_\@:
+ pause
+ lfence
+ jmp .Lspec_trap_\@
+.Ldo_rop_\@:
+ mov \reg, (%_ASM_SP)
+ ret
+.endm
+
+/*
+ * This is a wrapper around RETPOLINE_JMP so the called function in reg
+ * returns to the instruction after the macro.
+ */
+.macro RETPOLINE_CALL reg:req
+ jmp .Ldo_call_\@
+.Ldo_retpoline_jmp_\@:
+ RETPOLINE_JMP \reg
+.Ldo_call_\@:
+ call .Ldo_retpoline_jmp_\@
+.endm
+
+/*
+ * JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
+ * indirect jmp/call which may be susceptible to the Spectre variant 2
+ * attack.
+ */
+.macro JMP_NOSPEC reg:req
+#ifdef CONFIG_RETPOLINE
+ ALTERNATIVE_2 __stringify(jmp *\reg), \
+ __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
+ __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
+#else
+ jmp *\reg
+#endif
+.endm
+
+.macro CALL_NOSPEC reg:req
+#ifdef CONFIG_RETPOLINE
+ ALTERNATIVE_2 __stringify(call *\reg), \
+ __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
+ __stringify(lfence; call *\reg), X86_FEATURE_RETPOLINE_AMD
+#else
+ call *\reg
+#endif
+.endm
+
+ /*
+ * A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
+ * monstrosity above, manually.
+ */
+.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
+#ifdef CONFIG_RETPOLINE
+ ALTERNATIVE "jmp .Lskip_rsb_\@", \
+ __stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)) \
+ \ftr
+.Lskip_rsb_\@:
+#endif
+.endm
+
+#else /* __ASSEMBLY__ */
+
+#if defined(CONFIG_X86_64) && defined(RETPOLINE)
+
+/*
+ * Since the inline asm uses the %V modifier which is only in newer GCC,
+ * the 64-bit one is dependent on RETPOLINE not CONFIG_RETPOLINE.
+ */
+# define CALL_NOSPEC \
+ ALTERNATIVE( \
+ "call *%[thunk_target]\n", \
+ "call __x86_indirect_thunk_%V[thunk_target]\n", \
+ X86_FEATURE_RETPOLINE)
+# define THUNK_TARGET(addr) [thunk_target] "r" (addr)
+
+#elif defined(CONFIG_X86_32) && defined(CONFIG_RETPOLINE)
+/*
+ * For i386 we use the original ret-equivalent retpoline, because
+ * otherwise we'll run out of registers. We don't care about CET
+ * here, anyway.
+ */
+# define CALL_NOSPEC ALTERNATIVE("call *%[thunk_target]\n", \
+ " jmp 904f;\n" \
+ " .align 16\n" \
+ "901: call 903f;\n" \
+ "902: pause;\n" \
+ " lfence;\n" \
+ " jmp 902b;\n" \
+ " .align 16\n" \
+ "903: addl $4, %%esp;\n" \
+ " pushl %[thunk_target];\n" \
+ " ret;\n" \
+ " .align 16\n" \
+ "904: call 901b;\n", \
+ X86_FEATURE_RETPOLINE)
+
+# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
+#else /* No retpoline for C / inline asm */
+# define CALL_NOSPEC "call *%[thunk_target]\n"
+# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
+#endif
+
+/* The Spectre V2 mitigation variants */
+enum spectre_v2_mitigation {
+ SPECTRE_V2_NONE,
+ SPECTRE_V2_RETPOLINE_MINIMAL,
+ SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
+ SPECTRE_V2_RETPOLINE_GENERIC,
+ SPECTRE_V2_RETPOLINE_AMD,
+ SPECTRE_V2_IBRS,
+};
+
+extern char __indirect_thunk_start[];
+extern char __indirect_thunk_end[];
+
+/*
+ * On VMEXIT we must ensure that no RSB predictions learned in the guest
+ * can be followed in the host, by overwriting the RSB completely. Both
+ * retpoline and IBRS mitigations for Spectre v2 need this; only on future
+ * CPUs with IBRS_ALL *might* it be avoided.
+ */
+static inline void vmexit_fill_RSB(void)
+{
+#ifdef CONFIG_RETPOLINE
+ unsigned long loops;
+
+ asm volatile (ALTERNATIVE("jmp 910f",
+ __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
+ X86_FEATURE_RETPOLINE)
+ "910:"
+ : "=r" (loops), ASM_CALL_CONSTRAINT
+ : : "memory" );
+#endif
+}
+
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4a2f66843288..801b59a7e97d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -107,7 +107,7 @@ struct cpuinfo_x86 {
char x86_vendor_id[16];
char x86_model_id[64];
/* in KB - valid for CPUS which support this call: */
- int x86_cache_size;
+ unsigned int x86_cache_size;
int x86_cache_alignment; /* In bytes */
int x86_power;
unsigned long loops_per_jiffy;
@@ -147,8 +147,8 @@ extern struct cpuinfo_x86 boot_cpu_data;
extern struct cpuinfo_x86 new_cpu_data;

extern struct tss_struct doublefault_tss;
-extern __u32 cpu_caps_cleared[NCAPINTS];
-extern __u32 cpu_caps_set[NCAPINTS];
+extern __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
+extern __u32 cpu_caps_set[NCAPINTS + NBUGINTS];

#ifdef CONFIG_SMP
DECLARE_PER_CPU_SHARED_ALIGNED(struct cpuinfo_x86, cpu_info);
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index d7f3b3b78ac3..53ff351ded61 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -1,6 +1,8 @@
#ifndef _ASM_X86_SWITCH_TO_H
#define _ASM_X86_SWITCH_TO_H

+#include <asm/nospec-branch.h>
+
struct task_struct; /* one of the stranger aspects of C forward declarations */
__visible struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *next);
@@ -24,6 +26,23 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
#define __switch_canary_iparam
#endif /* CC_STACKPROTECTOR */

+#ifdef CONFIG_RETPOLINE
+ /*
+ * When switching from a shallower to a deeper call stack
+ * the RSB may either underflow or use entries populated
+ * with userspace addresses. On CPUs where those concerns
+ * exist, overwrite the RSB with entries which capture
+ * speculative execution to prevent attack.
+ */
+#define __retpoline_fill_return_buffer \
+ ALTERNATIVE("jmp 910f", \
+ __stringify(__FILL_RETURN_BUFFER(%%ebx, RSB_CLEAR_LOOPS, %%esp)),\
+ X86_FEATURE_RSB_CTXSW) \
+ "910:\n\t"
+#else
+#define __retpoline_fill_return_buffer
+#endif
+
/*
* Saving eflags is important. It switches not only IOPL between tasks,
* it also protects other tasks from NT leaking through sysenter etc.
@@ -46,6 +65,7 @@ do { \
"movl $1f,%[prev_ip]\n\t" /* save EIP */ \
"pushl %[next_ip]\n\t" /* restore EIP */ \
__switch_canary \
+ __retpoline_fill_return_buffer \
"jmp __switch_to\n" /* regparm call */ \
"1:\t" \
"popl %%ebp\n\t" /* restore EBP */ \
@@ -100,6 +120,23 @@ do { \
#define __switch_canary_iparam
#endif /* CC_STACKPROTECTOR */

+#ifdef CONFIG_RETPOLINE
+ /*
+ * When switching from a shallower to a deeper call stack
+ * the RSB may either underflow or use entries populated
+ * with userspace addresses. On CPUs where those concerns
+ * exist, overwrite the RSB with entries which capture
+ * speculative execution to prevent attack.
+ */
+#define __retpoline_fill_return_buffer \
+ ALTERNATIVE("jmp 910f", \
+ __stringify(__FILL_RETURN_BUFFER(%%r12, RSB_CLEAR_LOOPS, %%rsp)),\
+ X86_FEATURE_RSB_CTXSW) \
+ "910:\n\t"
+#else
+#define __retpoline_fill_return_buffer
+#endif
+
/* Save restore flags to clear handle leaking NT */
#define switch_to(prev, next, last) \
asm volatile(SAVE_CONTEXT \
@@ -108,6 +145,7 @@ do { \
"call __switch_to\n\t" \
"movq "__percpu_arg([current_task])",%%rsi\n\t" \
__switch_canary \
+ __retpoline_fill_return_buffer \
"movq %P[thread_info](%%rsi),%%r8\n\t" \
"movq %%rax,%%rdi\n\t" \
"testl %[_tif_fork],%P[ti_flags](%%r8)\n\t" \
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 887c0ea365a7..16eff1e17e96 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -143,6 +143,14 @@ extern int __get_user_4(void);
extern int __get_user_8(void);
extern int __get_user_bad(void);

+#define __uaccess_begin() stac()
+#define __uaccess_end() clac()
+#define __uaccess_begin_nospec() \
+({ \
+ stac(); \
+ barrier_nospec(); \
+})
+
/*
* This is a type: either unsigned long, if the argument fits into
* that type, or otherwise unsigned long long.
@@ -201,10 +209,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))

#ifdef CONFIG_X86_32
#define __put_user_asm_u64(x, addr, err, errret) \
- asm volatile(ASM_STAC "\n" \
+ asm volatile("\n" \
"1: movl %%eax,0(%2)\n" \
"2: movl %%edx,4(%2)\n" \
- "3: " ASM_CLAC "\n" \
+ "3:" \
".section .fixup,\"ax\"\n" \
"4: movl %3,%0\n" \
" jmp 3b\n" \
@@ -215,10 +223,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
: "A" (x), "r" (addr), "i" (errret), "0" (err))

#define __put_user_asm_ex_u64(x, addr) \
- asm volatile(ASM_STAC "\n" \
+ asm volatile("\n" \
"1: movl %%eax,0(%1)\n" \
"2: movl %%edx,4(%1)\n" \
- "3: " ASM_CLAC "\n" \
+ "3:" \
_ASM_EXTABLE_EX(1b, 2b) \
_ASM_EXTABLE_EX(2b, 3b) \
: : "A" (x), "r" (addr))
@@ -311,6 +319,10 @@ do { \
} \
} while (0)

+/*
+ * This doesn't do __uaccess_begin/end - the exception handling
+ * around it must do that.
+ */
#define __put_user_size_ex(x, ptr, size) \
do { \
__chk_user_ptr(ptr); \
@@ -365,9 +377,9 @@ do { \
} while (0)

#define __get_user_asm(x, addr, err, itype, rtype, ltype, errret) \
- asm volatile(ASM_STAC "\n" \
+ asm volatile("\n" \
"1: mov"itype" %2,%"rtype"1\n" \
- "2: " ASM_CLAC "\n" \
+ "2:\n" \
".section .fixup,\"ax\"\n" \
"3: mov %3,%0\n" \
" xor"itype" %"rtype"1,%"rtype"1\n" \
@@ -377,6 +389,10 @@ do { \
: "=r" (err), ltype(x) \
: "m" (__m(addr)), "i" (errret), "0" (err))

+/*
+ * This doesn't do __uaccess_begin/end - the exception handling
+ * around it must do that.
+ */
#define __get_user_size_ex(x, ptr, size) \
do { \
__chk_user_ptr(ptr); \
@@ -407,7 +423,9 @@ do { \
#define __put_user_nocheck(x, ptr, size) \
({ \
int __pu_err; \
+ __uaccess_begin(); \
__put_user_size((x), (ptr), (size), __pu_err, -EFAULT); \
+ __uaccess_end(); \
__pu_err; \
})

@@ -415,7 +433,9 @@ do { \
({ \
int __gu_err; \
unsigned long __gu_val; \
+ __uaccess_begin_nospec(); \
__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \
+ __uaccess_end(); \
(x) = (__force __typeof__(*(ptr)))__gu_val; \
__gu_err; \
})
@@ -430,9 +450,9 @@ struct __large_struct { unsigned long buf[100]; };
* aliasing issues.
*/
#define __put_user_asm(x, addr, err, itype, rtype, ltype, errret) \
- asm volatile(ASM_STAC "\n" \
+ asm volatile("\n" \
"1: mov"itype" %"rtype"1,%2\n" \
- "2: " ASM_CLAC "\n" \
+ "2:\n" \
".section .fixup,\"ax\"\n" \
"3: mov %3,%0\n" \
" jmp 2b\n" \
@@ -452,11 +472,15 @@ struct __large_struct { unsigned long buf[100]; };
*/
#define uaccess_try do { \
current_thread_info()->uaccess_err = 0; \
- stac(); \
+ __uaccess_begin(); \
barrier();

+#define uaccess_try_nospec do { \
+ current_thread_info()->uaccess_err = 0; \
+ __uaccess_begin_nospec(); \
+
#define uaccess_catch(err) \
- clac(); \
+ __uaccess_end(); \
(err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0); \
} while (0)

@@ -517,7 +541,7 @@ struct __large_struct { unsigned long buf[100]; };
* get_user_ex(...);
* } get_user_catch(err)
*/
-#define get_user_try uaccess_try
+#define get_user_try uaccess_try_nospec
#define get_user_catch(err) uaccess_catch(err)

#define get_user_ex(x, ptr) do { \
@@ -552,12 +576,13 @@ extern void __cmpxchg_wrong_size(void)
__typeof__(ptr) __uval = (uval); \
__typeof__(*(ptr)) __old = (old); \
__typeof__(*(ptr)) __new = (new); \
+ __uaccess_begin_nospec(); \
switch (size) { \
case 1: \
{ \
- asm volatile("\t" ASM_STAC "\n" \
+ asm volatile("\n" \
"1:\t" LOCK_PREFIX "cmpxchgb %4, %2\n" \
- "2:\t" ASM_CLAC "\n" \
+ "2:\n" \
"\t.section .fixup, \"ax\"\n" \
"3:\tmov %3, %0\n" \
"\tjmp 2b\n" \
@@ -571,9 +596,9 @@ extern void __cmpxchg_wrong_size(void)
} \
case 2: \
{ \
- asm volatile("\t" ASM_STAC "\n" \
+ asm volatile("\n" \
"1:\t" LOCK_PREFIX "cmpxchgw %4, %2\n" \
- "2:\t" ASM_CLAC "\n" \
+ "2:\n" \
"\t.section .fixup, \"ax\"\n" \
"3:\tmov %3, %0\n" \
"\tjmp 2b\n" \
@@ -587,9 +612,9 @@ extern void __cmpxchg_wrong_size(void)
} \
case 4: \
{ \
- asm volatile("\t" ASM_STAC "\n" \
+ asm volatile("\n" \
"1:\t" LOCK_PREFIX "cmpxchgl %4, %2\n" \
- "2:\t" ASM_CLAC "\n" \
+ "2:\n" \
"\t.section .fixup, \"ax\"\n" \
"3:\tmov %3, %0\n" \
"\tjmp 2b\n" \
@@ -606,9 +631,9 @@ extern void __cmpxchg_wrong_size(void)
if (!IS_ENABLED(CONFIG_X86_64)) \
__cmpxchg_wrong_size(); \
\
- asm volatile("\t" ASM_STAC "\n" \
+ asm volatile("\n" \
"1:\t" LOCK_PREFIX "cmpxchgq %4, %2\n" \
- "2:\t" ASM_CLAC "\n" \
+ "2:\n" \
"\t.section .fixup, \"ax\"\n" \
"3:\tmov %3, %0\n" \
"\tjmp 2b\n" \
@@ -623,6 +648,7 @@ extern void __cmpxchg_wrong_size(void)
default: \
__cmpxchg_wrong_size(); \
} \
+ __uaccess_end(); \
*__uval = __old; \
__ret; \
})
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 3c03a5de64d3..c803818cedfb 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -48,16 +48,22 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)

switch (n) {
case 1:
+ __uaccess_begin_nospec();
__put_user_size(*(u8 *)from, (u8 __user *)to,
1, ret, 1);
+ __uaccess_end();
return ret;
case 2:
+ __uaccess_begin_nospec();
__put_user_size(*(u16 *)from, (u16 __user *)to,
2, ret, 2);
+ __uaccess_end();
return ret;
case 4:
+ __uaccess_begin_nospec();
__put_user_size(*(u32 *)from, (u32 __user *)to,
4, ret, 4);
+ __uaccess_end();
return ret;
}
}
@@ -98,13 +104,19 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)

switch (n) {
case 1:
+ __uaccess_begin_nospec();
__get_user_size(*(u8 *)to, from, 1, ret, 1);
+ __uaccess_end();
return ret;
case 2:
+ __uaccess_begin_nospec();
__get_user_size(*(u16 *)to, from, 2, ret, 2);
+ __uaccess_end();
return ret;
case 4:
+ __uaccess_begin_nospec();
__get_user_size(*(u32 *)to, from, 4, ret, 4);
+ __uaccess_end();
return ret;
}
}
@@ -142,13 +154,19 @@ __copy_from_user(void *to, const void __user *from, unsigned long n)

switch (n) {
case 1:
+ __uaccess_begin_nospec();
__get_user_size(*(u8 *)to, from, 1, ret, 1);
+ __uaccess_end();
return ret;
case 2:
+ __uaccess_begin_nospec();
__get_user_size(*(u16 *)to, from, 2, ret, 2);
+ __uaccess_end();
return ret;
case 4:
+ __uaccess_begin_nospec();
__get_user_size(*(u32 *)to, from, 4, ret, 4);
+ __uaccess_end();
return ret;
}
}
@@ -164,13 +182,19 @@ static __always_inline unsigned long __copy_from_user_nocache(void *to,

switch (n) {
case 1:
+ __uaccess_begin_nospec();
__get_user_size(*(u8 *)to, from, 1, ret, 1);
+ __uaccess_end();
return ret;
case 2:
+ __uaccess_begin_nospec();
__get_user_size(*(u16 *)to, from, 2, ret, 2);
+ __uaccess_end();
return ret;
case 4:
+ __uaccess_begin_nospec();
__get_user_size(*(u32 *)to, from, 4, ret, 4);
+ __uaccess_end();
return ret;
}
}
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 12a26b979bf1..6415eb20cf97 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -56,35 +56,49 @@ int __copy_from_user_nocheck(void *dst, const void __user *src, unsigned size)
if (!__builtin_constant_p(size))
return copy_user_generic(dst, (__force void *)src, size);
switch (size) {
- case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src,
+ case 1:
+ __uaccess_begin_nospec();
+ __get_user_asm(*(u8 *)dst, (u8 __user *)src,
ret, "b", "b", "=q", 1);
+ __uaccess_end();
return ret;
- case 2:__get_user_asm(*(u16 *)dst, (u16 __user *)src,
+ case 2:
+ __uaccess_begin_nospec();
+ __get_user_asm(*(u16 *)dst, (u16 __user *)src,
ret, "w", "w", "=r", 2);
+ __uaccess_end();
return ret;
- case 4:__get_user_asm(*(u32 *)dst, (u32 __user *)src,
+ case 4:
+ __uaccess_begin_nospec();
+ __get_user_asm(*(u32 *)dst, (u32 __user *)src,
ret, "l", "k", "=r", 4);
+ __uaccess_end();
return ret;
- case 8:__get_user_asm(*(u64 *)dst, (u64 __user *)src,
+ case 8:
+ __uaccess_begin_nospec();
+ __get_user_asm(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 8);
+ __uaccess_end();
return ret;
case 10:
+ __uaccess_begin_nospec();
__get_user_asm(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 10);
- if (unlikely(ret))
- return ret;
- __get_user_asm(*(u16 *)(8 + (char *)dst),
- (u16 __user *)(8 + (char __user *)src),
- ret, "w", "w", "=r", 2);
+ if (likely(!ret))
+ __get_user_asm(*(u16 *)(8 + (char *)dst),
+ (u16 __user *)(8 + (char __user *)src),
+ ret, "w", "w", "=r", 2);
+ __uaccess_end();
return ret;
case 16:
+ __uaccess_begin_nospec();
__get_user_asm(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 16);
- if (unlikely(ret))
- return ret;
- __get_user_asm(*(u64 *)(8 + (char *)dst),
- (u64 __user *)(8 + (char __user *)src),
- ret, "q", "", "=r", 8);
+ if (likely(!ret))
+ __get_user_asm(*(u64 *)(8 + (char *)dst),
+ (u64 __user *)(8 + (char __user *)src),
+ ret, "q", "", "=r", 8);
+ __uaccess_end();
return ret;
default:
return copy_user_generic(dst, (__force void *)src, size);
@@ -106,35 +120,51 @@ int __copy_to_user_nocheck(void __user *dst, const void *src, unsigned size)
if (!__builtin_constant_p(size))
return copy_user_generic((__force void *)dst, src, size);
switch (size) {
- case 1:__put_user_asm(*(u8 *)src, (u8 __user *)dst,
+ case 1:
+ __uaccess_begin();
+ __put_user_asm(*(u8 *)src, (u8 __user *)dst,
ret, "b", "b", "iq", 1);
+ __uaccess_end();
return ret;
- case 2:__put_user_asm(*(u16 *)src, (u16 __user *)dst,
+ case 2:
+ __uaccess_begin();
+ __put_user_asm(*(u16 *)src, (u16 __user *)dst,
ret, "w", "w", "ir", 2);
+ __uaccess_end();
return ret;
- case 4:__put_user_asm(*(u32 *)src, (u32 __user *)dst,
+ case 4:
+ __uaccess_begin();
+ __put_user_asm(*(u32 *)src, (u32 __user *)dst,
ret, "l", "k", "ir", 4);
+ __uaccess_end();
return ret;
- case 8:__put_user_asm(*(u64 *)src, (u64 __user *)dst,
+ case 8:
+ __uaccess_begin();
+ __put_user_asm(*(u64 *)src, (u64 __user *)dst,
ret, "q", "", "er", 8);
+ __uaccess_end();
return ret;
case 10:
+ __uaccess_begin();
__put_user_asm(*(u64 *)src, (u64 __user *)dst,
ret, "q", "", "er", 10);
- if (unlikely(ret))
- return ret;
- asm("":::"memory");
- __put_user_asm(4[(u16 *)src], 4 + (u16 __user *)dst,
- ret, "w", "w", "ir", 2);
+ if (likely(!ret)) {
+ asm("":::"memory");
+ __put_user_asm(4[(u16 *)src], 4 + (u16 __user *)dst,
+ ret, "w", "w", "ir", 2);
+ }
+ __uaccess_end();
return ret;
case 16:
+ __uaccess_begin();
__put_user_asm(*(u64 *)src, (u64 __user *)dst,
ret, "q", "", "er", 16);
- if (unlikely(ret))
- return ret;
- asm("":::"memory");
- __put_user_asm(1[(u64 *)src], 1 + (u64 __user *)dst,
- ret, "q", "", "er", 8);
+ if (likely(!ret)) {
+ asm("":::"memory");
+ __put_user_asm(1[(u64 *)src], 1 + (u64 __user *)dst,
+ ret, "q", "", "er", 8);
+ }
+ __uaccess_end();
return ret;
default:
return copy_user_generic((__force void *)dst, src, size);
@@ -160,39 +190,47 @@ int __copy_in_user(void __user *dst, const void __user *src, unsigned size)
switch (size) {
case 1: {
u8 tmp;
+ __uaccess_begin_nospec();
__get_user_asm(tmp, (u8 __user *)src,
ret, "b", "b", "=q", 1);
if (likely(!ret))
__put_user_asm(tmp, (u8 __user *)dst,
ret, "b", "b", "iq", 1);
+ __uaccess_end();
return ret;
}
case 2: {
u16 tmp;
+ __uaccess_begin_nospec();
__get_user_asm(tmp, (u16 __user *)src,
ret, "w", "w", "=r", 2);
if (likely(!ret))
__put_user_asm(tmp, (u16 __user *)dst,
ret, "w", "w", "ir", 2);
+ __uaccess_end();
return ret;
}

case 4: {
u32 tmp;
+ __uaccess_begin_nospec();
__get_user_asm(tmp, (u32 __user *)src,
ret, "l", "k", "=r", 4);
if (likely(!ret))
__put_user_asm(tmp, (u32 __user *)dst,
ret, "l", "k", "ir", 4);
+ __uaccess_end();
return ret;
}
case 8: {
u64 tmp;
+ __uaccess_begin_nospec();
__get_user_asm(tmp, (u64 __user *)src,
ret, "q", "", "=r", 8);
if (likely(!ret))
__put_user_asm(tmp, (u64 __user *)dst,
ret, "q", "", "er", 8);
+ __uaccess_end();
return ret;
}
default:
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 4ad5a91aea79..da45f9fc1913 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -44,6 +44,7 @@
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/smap.h>
+#include <asm/nospec-branch.h>

#include <xen/interface/xen.h>
#include <xen/interface/sched.h>
@@ -215,9 +216,9 @@ privcmd_call(unsigned call,
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);

stac();
- asm volatile("call *%[call]"
+ asm volatile(CALL_NOSPEC
: __HYPERCALL_5PARAM
- : [call] "a" (&hypercall_page[call])
+ : [thunk_target] "a" (&hypercall_page[call])
: __HYPERCALL_CLOBBER5);
clac();

diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index 392ef72ea41c..1d3811d1506e 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -223,6 +223,9 @@
#define FAM10H_MMIO_CONF_BASE_MASK 0xfffffffULL
#define FAM10H_MMIO_CONF_BASE_SHIFT 20
#define MSR_FAM10H_NODE_ID 0xc001100c
+#define MSR_F10H_DECFG 0xc0011029
+#define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1
+#define MSR_F10H_DECFG_LFENCE_SERIALIZE BIT_ULL(MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT)

/* K8 MSRs */
#define MSR_K8_TOP_MEM1 0xc001001a
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index fa030d032eb7..b44d458e5118 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -41,17 +41,6 @@ static int __init setup_noreplace_smp(char *str)
}
__setup("noreplace-smp", setup_noreplace_smp);

-#ifdef CONFIG_PARAVIRT
-static int __initdata_or_module noreplace_paravirt = 0;
-
-static int __init setup_noreplace_paravirt(char *str)
-{
- noreplace_paravirt = 1;
- return 1;
-}
-__setup("noreplace-paravirt", setup_noreplace_paravirt);
-#endif
-
#define DPRINTK(fmt, args...) \
do { \
if (debug_alternative) \
@@ -325,7 +314,18 @@ recompute_jump(struct alt_instr *a, u8 *orig_insn, u8 *repl_insn, u8 *insnbuf)

static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
{
+ unsigned long flags;
+ int i;
+
+ for (i = 0; i < a->padlen; i++) {
+ if (instr[i] != 0x90)
+ return;
+ }
+
+ local_irq_save(flags);
add_nops(instr + (a->instrlen - a->padlen), a->padlen);
+ sync_core();
+ local_irq_restore(flags);

DUMP_BYTES(instr, a->instrlen, "%p: [%d:%d) optimized NOPs: ",
instr, a->instrlen - a->padlen, a->padlen);
@@ -369,11 +369,11 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
continue;
}

- DPRINTK("feat: %d*32+%d, old: (%p, len: %d), repl: (%p, len: %d)",
+ DPRINTK("feat: %d*32+%d, old: (%p, len: %d), repl: (%p, len: %d), pad: %d",
a->cpuid >> 5,
a->cpuid & 0x1f,
instr, a->instrlen,
- replacement, a->replacementlen);
+ replacement, a->replacementlen, a->padlen);

DUMP_BYTES(instr, a->instrlen, "%p: old_insn: ", instr);
DUMP_BYTES(replacement, a->replacementlen, "%p: rpl_insn: ", replacement);
@@ -563,9 +563,6 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
struct paravirt_patch_site *p;
char insnbuf[MAX_PATCH_LEN];

- if (noreplace_paravirt)
- return;
-
for (p = start; p < end; p++) {
unsigned int used;

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 7fd54f09b011..e96204e419c4 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -16,9 +16,7 @@ obj-y := intel_cacheinfo.o scattered.o topology.o
obj-y += proc.o capflags.o powerflags.o common.o
obj-y += rdrand.o
obj-y += match.o
-
-obj-$(CONFIG_X86_32) += bugs.o
-obj-$(CONFIG_X86_64) += bugs_64.o
+obj-y += bugs.o

obj-$(CONFIG_CPU_SUP_INTEL) += intel.o
obj-$(CONFIG_CPU_SUP_AMD) += amd.o
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6238499e51a2..2b597d03d5ed 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -673,8 +673,32 @@ static void init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_K8);

if (cpu_has_xmm2) {
- /* MFENCE stops RDTSC speculation */
- set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
+ unsigned long long val;
+ int ret;
+
+ /*
+ * A serializing LFENCE has less overhead than MFENCE, so
+ * use it for execution serialization. On families which
+ * don't have that MSR, LFENCE is already serializing.
+ * msr_set_bit() uses the safe accessors, too, even if the MSR
+ * is not present.
+ */
+ msr_set_bit(MSR_F10H_DECFG,
+ MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
+
+ /*
+ * Verify that the MSR write was successful (could be running
+ * under a hypervisor) and only then assume that LFENCE is
+ * serializing.
+ */
+ ret = rdmsrl_safe(MSR_F10H_DECFG, &val);
+ if (!ret && (val & MSR_F10H_DECFG_LFENCE_SERIALIZE)) {
+ /* A serializing LFENCE stops RDTSC speculation */
+ set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);
+ } else {
+ /* MFENCE stops RDTSC speculation */
+ set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
+ }
}

#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 4c7dd836304a..db2fc61ba99a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -9,6 +9,11 @@
*/
#include <linux/init.h>
#include <linux/utsname.h>
+#include <linux/cpu.h>
+#include <linux/module.h>
+
+#include <asm/nospec-branch.h>
+#include <asm/cmdline.h>
#include <asm/bugs.h>
#include <asm/processor.h>
#include <asm/processor-flags.h>
@@ -16,6 +21,13 @@
#include <asm/msr.h>
#include <asm/paravirt.h>
#include <asm/alternative.h>
+#include <asm/pgtable.h>
+#include <asm/cacheflush.h>
+#include <asm/intel-family.h>
+
+static void __init spectre_v2_select_mitigation(void);
+
+#ifdef CONFIG_X86_32

static double __initdata x = 4195835.0;
static double __initdata y = 3145727.0;
@@ -63,6 +75,8 @@ static void __init check_fpu(void)
}
}

+#endif /* CONFIG_X86_32 */
+
void __init check_bugs(void)
{
#ifdef CONFIG_X86_32
@@ -74,11 +88,16 @@ void __init check_bugs(void)
#endif

identify_boot_cpu();
-#ifndef CONFIG_SMP
- pr_info("CPU: ");
- print_cpu_info(&boot_cpu_data);
-#endif

+ if (!IS_ENABLED(CONFIG_SMP)) {
+ pr_info("CPU: ");
+ print_cpu_info(&boot_cpu_data);
+ }
+
+ /* Select the proper spectre mitigation before patching alternatives */
+ spectre_v2_select_mitigation();
+
+#ifdef CONFIG_X86_32
/*
* Check whether we are able to run this kernel safely on SMP.
*
@@ -99,4 +118,276 @@ void __init check_bugs(void)
*/
if (cpu_has_fpu)
check_fpu();
+#else /* CONFIG_X86_64 */
+ alternative_instructions();
+
+ /*
+ * Make sure the first 2MB area is not mapped by huge pages
+ * There are typically fixed size MTRRs in there and overlapping
+ * MTRRs into large pages causes slow downs.
+ *
+ * Right now we don't do that with gbpages because there seems
+ * very little benefit for that case.
+ */
+ if (!direct_gbpages)
+ set_memory_4k((unsigned long)__va(0), 1);
+#endif
}
+
+/* The kernel command line selection */
+enum spectre_v2_mitigation_cmd {
+ SPECTRE_V2_CMD_NONE,
+ SPECTRE_V2_CMD_AUTO,
+ SPECTRE_V2_CMD_FORCE,
+ SPECTRE_V2_CMD_RETPOLINE,
+ SPECTRE_V2_CMD_RETPOLINE_GENERIC,
+ SPECTRE_V2_CMD_RETPOLINE_AMD,
+};
+
+static const char *spectre_v2_strings[] = {
+ [SPECTRE_V2_NONE] = "Vulnerable",
+ [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline",
+ [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline",
+ [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline",
+ [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline",
+};
+
+#undef pr_fmt
+#define pr_fmt(fmt) "Spectre V2 : " fmt
+
+static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE;
+
+#ifdef RETPOLINE
+static bool spectre_v2_bad_module;
+
+bool retpoline_module_ok(bool has_retpoline)
+{
+ if (spectre_v2_enabled == SPECTRE_V2_NONE || has_retpoline)
+ return true;
+
+ pr_err("System may be vulnerable to spectre v2\n");
+ spectre_v2_bad_module = true;
+ return false;
+}
+
+static inline const char *spectre_v2_module_string(void)
+{
+ return spectre_v2_bad_module ? " - vulnerable module loaded" : "";
+}
+#else
+static inline const char *spectre_v2_module_string(void) { return ""; }
+#endif
+
+static void __init spec2_print_if_insecure(const char *reason)
+{
+ if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
+ pr_info("%s selected on command line.\n", reason);
+}
+
+static void __init spec2_print_if_secure(const char *reason)
+{
+ if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
+ pr_info("%s selected on command line.\n", reason);
+}
+
+static inline bool retp_compiler(void)
+{
+ return __is_defined(RETPOLINE);
+}
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static const struct {
+ const char *option;
+ enum spectre_v2_mitigation_cmd cmd;
+ bool secure;
+} mitigation_options[] = {
+ { "off", SPECTRE_V2_CMD_NONE, false },
+ { "on", SPECTRE_V2_CMD_FORCE, true },
+ { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false },
+ { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false },
+ { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false },
+ { "auto", SPECTRE_V2_CMD_AUTO, false },
+};
+
+static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
+{
+ char arg[20];
+ int ret, i;
+ enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO;
+
+ if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
+ return SPECTRE_V2_CMD_NONE;
+ else {
+ ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
+ sizeof(arg));
+ if (ret < 0)
+ return SPECTRE_V2_CMD_AUTO;
+
+ for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) {
+ if (!match_option(arg, ret, mitigation_options[i].option))
+ continue;
+ cmd = mitigation_options[i].cmd;
+ break;
+ }
+
+ if (i >= ARRAY_SIZE(mitigation_options)) {
+ pr_err("unknown option (%s). Switching to AUTO select\n", arg);
+ return SPECTRE_V2_CMD_AUTO;
+ }
+ }
+
+ if ((cmd == SPECTRE_V2_CMD_RETPOLINE ||
+ cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
+ cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
+ !IS_ENABLED(CONFIG_RETPOLINE)) {
+ pr_err("%s selected but not compiled in. Switching to AUTO select\n",
+ mitigation_options[i].option);
+ return SPECTRE_V2_CMD_AUTO;
+ }
+
+ if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD &&
+ boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
+ pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
+ return SPECTRE_V2_CMD_AUTO;
+ }
+
+ if (mitigation_options[i].secure)
+ spec2_print_if_secure(mitigation_options[i].option);
+ else
+ spec2_print_if_insecure(mitigation_options[i].option);
+
+ return cmd;
+}
+
+/* Check for Skylake-like CPUs (for RSB handling) */
+static bool __init is_skylake_era(void)
+{
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+ boot_cpu_data.x86 == 6) {
+ switch (boot_cpu_data.x86_model) {
+ case INTEL_FAM6_SKYLAKE_MOBILE:
+ case INTEL_FAM6_SKYLAKE_DESKTOP:
+ case INTEL_FAM6_SKYLAKE_X:
+ case INTEL_FAM6_KABYLAKE_MOBILE:
+ case INTEL_FAM6_KABYLAKE_DESKTOP:
+ return true;
+ }
+ }
+ return false;
+}
+
+static void __init spectre_v2_select_mitigation(void)
+{
+ enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
+ enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;
+
+ /*
+ * If the CPU is not affected and the command line mode is NONE or AUTO
+ * then nothing to do.
+ */
+ if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2) &&
+ (cmd == SPECTRE_V2_CMD_NONE || cmd == SPECTRE_V2_CMD_AUTO))
+ return;
+
+ switch (cmd) {
+ case SPECTRE_V2_CMD_NONE:
+ return;
+
+ case SPECTRE_V2_CMD_FORCE:
+ case SPECTRE_V2_CMD_AUTO:
+ if (IS_ENABLED(CONFIG_RETPOLINE))
+ goto retpoline_auto;
+ break;
+ case SPECTRE_V2_CMD_RETPOLINE_AMD:
+ if (IS_ENABLED(CONFIG_RETPOLINE))
+ goto retpoline_amd;
+ break;
+ case SPECTRE_V2_CMD_RETPOLINE_GENERIC:
+ if (IS_ENABLED(CONFIG_RETPOLINE))
+ goto retpoline_generic;
+ break;
+ case SPECTRE_V2_CMD_RETPOLINE:
+ if (IS_ENABLED(CONFIG_RETPOLINE))
+ goto retpoline_auto;
+ break;
+ }
+ pr_err("kernel not compiled with retpoline; no mitigation available!");
+ return;
+
+retpoline_auto:
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+ retpoline_amd:
+ if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
+ pr_err("LFENCE not serializing. Switching to generic retpoline\n");
+ goto retpoline_generic;
+ }
+ mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD :
+ SPECTRE_V2_RETPOLINE_MINIMAL_AMD;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD);
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ } else {
+ retpoline_generic:
+ mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC :
+ SPECTRE_V2_RETPOLINE_MINIMAL;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ }
+
+ spectre_v2_enabled = mode;
+ pr_info("%s\n", spectre_v2_strings[mode]);
+
+ /*
+ * If neither SMEP or KPTI are available, there is a risk of
+ * hitting userspace addresses in the RSB after a context switch
+ * from a shallow call stack to a deeper one. To prevent this fill
+ * the entire RSB, even when using IBRS.
+ *
+ * Skylake era CPUs have a separate issue with *underflow* of the
+ * RSB, when they will predict 'ret' targets from the generic BTB.
+ * The proper mitigation for this is IBRS. If IBRS is not supported
+ * or deactivated in favour of retpolines the RSB fill on context
+ * switch is required.
+ */
+ if ((!boot_cpu_has(X86_FEATURE_KAISER) &&
+ !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
+ setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+ pr_info("Filling RSB on context switch\n");
+ }
+}
+
+#undef pr_fmt
+
+#ifdef CONFIG_SYSFS
+ssize_t cpu_show_meltdown(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
+ return sprintf(buf, "Not affected\n");
+ if (boot_cpu_has(X86_FEATURE_KAISER))
+ return sprintf(buf, "Mitigation: PTI\n");
+ return sprintf(buf, "Vulnerable\n");
+}
+
+ssize_t cpu_show_spectre_v1(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
+ return sprintf(buf, "Not affected\n");
+ return sprintf(buf, "Mitigation: __user pointer sanitization\n");
+}
+
+ssize_t cpu_show_spectre_v2(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
+ return sprintf(buf, "Not affected\n");
+
+ return sprintf(buf, "%s%s\n", spectre_v2_strings[spectre_v2_enabled],
+ spectre_v2_module_string());
+}
+#endif
diff --git a/arch/x86/kernel/cpu/bugs_64.c b/arch/x86/kernel/cpu/bugs_64.c
deleted file mode 100644
index 04f0fe5af83e..000000000000
--- a/arch/x86/kernel/cpu/bugs_64.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/*
- * Copyright (C) 1994 Linus Torvalds
- * Copyright (C) 2000 SuSE
- */
-
-#include <linux/kernel.h>
-#include <linux/init.h>
-#include <asm/alternative.h>
-#include <asm/bugs.h>
-#include <asm/processor.h>
-#include <asm/mtrr.h>
-#include <asm/cacheflush.h>
-
-void __init check_bugs(void)
-{
- identify_boot_cpu();
-#if !defined(CONFIG_SMP)
- printk(KERN_INFO "CPU: ");
- print_cpu_info(&boot_cpu_data);
-#endif
- alternative_instructions();
-
- /*
- * Make sure the first 2MB area is not mapped by huge pages
- * There are typically fixed size MTRRs in there and overlapping
- * MTRRs into large pages causes slow downs.
- *
- * Right now we don't do that with gbpages because there seems
- * very little benefit for that case.
- */
- if (!direct_gbpages)
- set_memory_4k((unsigned long)__va(0), 1);
-}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 274f5a7b27e7..44fd2ecb9859 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -441,8 +441,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
return NULL; /* Not found */
}

-__u32 cpu_caps_cleared[NCAPINTS];
-__u32 cpu_caps_set[NCAPINTS];
+__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
+__u32 cpu_caps_set[NCAPINTS + NBUGINTS];

void load_percpu_segment(int cpu)
{
@@ -670,6 +670,16 @@ void cpu_detect(struct cpuinfo_x86 *c)
}
}

+static void apply_forced_caps(struct cpuinfo_x86 *c)
+{
+ int i;
+
+ for (i = 0; i < NCAPINTS + NBUGINTS; i++) {
+ c->x86_capability[i] &= ~cpu_caps_cleared[i];
+ c->x86_capability[i] |= cpu_caps_set[i];
+ }
+}
+
void get_cpu_cap(struct cpuinfo_x86 *c)
{
u32 tfms, xlvl;
@@ -794,6 +804,12 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
this_cpu->c_bsp_init(c);

setup_force_cpu_cap(X86_FEATURE_ALWAYS);
+
+ if (c->x86_vendor != X86_VENDOR_AMD)
+ setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
+
+ setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
+ setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
}

void __init early_cpu_init(void)
@@ -889,7 +905,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
int i;

c->loops_per_jiffy = loops_per_jiffy;
- c->x86_cache_size = -1;
+ c->x86_cache_size = 0;
c->x86_vendor = X86_VENDOR_UNKNOWN;
c->x86_model = c->x86_mask = 0; /* So far unknown... */
c->x86_vendor_id[0] = '\0'; /* Unset */
@@ -915,10 +931,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
this_cpu->c_identify(c);

/* Clear/Set all flags overriden by options, after probe */
- for (i = 0; i < NCAPINTS; i++) {
- c->x86_capability[i] &= ~cpu_caps_cleared[i];
- c->x86_capability[i] |= cpu_caps_set[i];
- }
+ apply_forced_caps(c);

#ifdef CONFIG_X86_64
c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
@@ -978,10 +991,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
* Clear/Set all flags overriden by options, need do it
* before following smp all cpus cap AND.
*/
- for (i = 0; i < NCAPINTS; i++) {
- c->x86_capability[i] &= ~cpu_caps_cleared[i];
- c->x86_capability[i] |= cpu_caps_set[i];
- }
+ apply_forced_caps(c);

/*
* On SMP, boot_cpu_data holds the common feature set between
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 54299e585c3b..7a93397bcc35 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -352,7 +352,7 @@ static struct microcode_ops microcode_intel_ops = {

static int __init calc_llc_size_per_core(struct cpuinfo_x86 *c)
{
- u64 llc_size = c->x86_cache_size * 1024;
+ u64 llc_size = c->x86_cache_size * 1024ULL;

do_div(llc_size, c->x86_max_cores);

diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 06fe3ed8b851..8d9c2afd16c6 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -86,8 +86,8 @@ static int show_cpuinfo(struct seq_file *m, void *v)
}

/* Cache size */
- if (c->x86_cache_size >= 0)
- seq_printf(m, "cache size\t: %d KB\n", c->x86_cache_size);
+ if (c->x86_cache_size)
+ seq_printf(m, "cache size\t: %u KB\n", c->x86_cache_size);

show_cpuinfo_core(m, c, cpu);
show_cpuinfo_misc(m, c);
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index bfcc300d8b3c..7e5dc491d619 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -58,6 +58,7 @@
#include <asm/alternative-asm.h>
#include <asm/asm.h>
#include <asm/smap.h>
+#include <asm/nospec-branch.h>

/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
#include <linux/elf-em.h>
@@ -308,7 +309,8 @@ ENTRY(ret_from_kernel_thread)
pushl_cfi $0x0202 # Reset kernel eflags
popfl_cfi
movl PT_EBP(%esp),%eax
- call *PT_EBX(%esp)
+ movl PT_EBX(%esp), %edx
+ CALL_NOSPEC %edx
movl $0,PT_EAX(%esp)
jmp syscall_exit
CFI_ENDPROC
@@ -424,7 +426,14 @@ ENTRY(ia32_sysenter_target)
sysenter_do_call:
cmpl $(NR_syscalls), %eax
jae sysenter_badsys
+ sbb %edx, %edx /* array_index_mask_nospec() */
+ and %edx, %eax
+#ifdef CONFIG_RETPOLINE
+ movl sys_call_table(,%eax,4),%eax
+ call __x86_indirect_thunk_eax
+#else
call *sys_call_table(,%eax,4)
+#endif
sysenter_after_call:
movl %eax,PT_EAX(%esp)
LOCKDEP_SYS_EXIT
@@ -501,7 +510,14 @@ ENTRY(system_call)
cmpl $(NR_syscalls), %eax
jae syscall_badsys
syscall_call:
+ sbb %edx, %edx /* array_index_mask_nospec() */
+ and %edx, %eax
+#ifdef CONFIG_RETPOLINE
+ movl sys_call_table(,%eax,4),%eax
+ call __x86_indirect_thunk_eax
+#else
call *sys_call_table(,%eax,4)
+#endif
syscall_after_call:
movl %eax,PT_EAX(%esp) # store the return value
syscall_exit:
@@ -1187,7 +1203,8 @@ ENTRY(mcount)
movl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax

- call *ftrace_trace_function
+ movl ftrace_trace_function, %ecx
+ CALL_NOSPEC %ecx

popl %edx
popl %ecx
@@ -1222,7 +1239,7 @@ END(ftrace_graph_caller)
movl %eax, %ecx
popl %edx
popl %eax
- jmp *%ecx
+ JMP_NOSPEC %ecx
#endif

#ifdef CONFIG_TRACING
@@ -1277,7 +1294,7 @@ ENTRY(page_fault)
movl %ecx, %es
TRACE_IRQS_OFF
movl %esp,%eax # pt_regs pointer
- call *%edi
+ CALL_NOSPEC %edi
jmp ret_from_exception
CFI_ENDPROC
END(page_fault)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0706553873e7..3ae017f16930 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -59,6 +59,7 @@
#include <asm/smap.h>
#include <asm/pgtable_types.h>
#include <asm/kaiser.h>
+#include <asm/nospec-branch.h>
#include <linux/err.h>

/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
@@ -375,7 +376,7 @@ ENTRY(ret_from_fork)
subq $REST_SKIP, %rsp # leave space for volatiles
CFI_ADJUST_CFA_OFFSET REST_SKIP
movq %rbp, %rdi
- call *%rbx
+ CALL_NOSPEC %rbx
movl $0, RAX(%rsp)
RESTORE_REST
jmp int_ret_from_sys_call
@@ -444,14 +445,21 @@ GLOBAL(system_call_after_swapgs)
jnz tracesys
system_call_fastpath:
#if __SYSCALL_MASK == ~0
- cmpq $__NR_syscall_max,%rax
+ cmpq $NR_syscalls, %rax
#else
andl $__SYSCALL_MASK,%eax
- cmpl $__NR_syscall_max,%eax
+ cmpl $NR_syscalls, %eax
#endif
- ja badsys
+ jae badsys
+ sbb %rcx, %rcx /* array_index_mask_nospec() */
+ and %rcx, %rax
movq %r10,%rcx
+#ifdef CONFIG_RETPOLINE
+ movq sys_call_table(, %rax, 8), %rax
+ call __x86_indirect_thunk_rax
+#else
call *sys_call_table(,%rax,8) # XXX: rip relative
+#endif
movq %rax,RAX-ARGOFFSET(%rsp)
/*
* Syscall return path ending with SYSRET (fast path)
@@ -571,14 +579,21 @@ GLOBAL(system_call_after_swapgs)
LOAD_ARGS ARGOFFSET, 1
RESTORE_REST
#if __SYSCALL_MASK == ~0
- cmpq $__NR_syscall_max,%rax
+ cmpq $NR_syscalls, %rax
#else
andl $__SYSCALL_MASK,%eax
- cmpl $__NR_syscall_max,%eax
+ cmpl $NR_syscalls, %eax
#endif
- ja int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */
+ jae int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */
+ sbb %rcx, %rcx /* array_index_mask_nospec() */
+ and %rcx, %rax
movq %r10,%rcx /* fixup for C */
+#ifdef CONFIG_RETPOLINE
+ movq sys_call_table(, %rax, 8), %rax
+ call __x86_indirect_thunk_rax
+#else
call *sys_call_table(,%rax,8)
+#endif
movq %rax,RAX-ARGOFFSET(%rsp)
/* Use IRET because user could have changed frame */

diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index 63ce838e5a54..d2cae431ef0e 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -20,6 +20,7 @@
#include <linux/mm.h>

#include <asm/apic.h>
+#include <asm/nospec-branch.h>

DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);
@@ -61,21 +62,14 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
static void call_on_stack(void *func, void *stack)
{
asm volatile("xchgl %%ebx,%%esp \n"
- "call *%%edi \n"
+ CALL_NOSPEC
"movl %%ebx,%%esp \n"
: "=b" (stack)
: "0" (stack),
- "D"(func)
+ [thunk_target] "D"(func)
: "memory", "cc", "edx", "ecx", "eax");
}

-/* how to get the current stack pointer from C */
-#define current_stack_pointer ({ \
- unsigned long sp; \
- asm("mov %%esp,%0" : "=g" (sp)); \
- sp; \
-})
-
static inline void *current_stack(void)
{
return (void *)(current_stack_pointer & ~(THREAD_SIZE - 1));
@@ -109,11 +103,11 @@ execute_on_irq_stack(int overflow, struct irq_desc *desc, int irq)
call_on_stack(print_stack_overflow, isp);

asm volatile("xchgl %%ebx,%%esp \n"
- "call *%%edi \n"
+ CALL_NOSPEC
"movl %%ebx,%%esp \n"
: "=a" (arg1), "=d" (arg2), "=b" (isp)
: "0" (irq), "1" (desc), "2" (isp),
- "D" (desc->handle_irq)
+ [thunk_target] "D" (desc->handle_irq)
: "memory", "cc", "ecx");
return 1;
}
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 7f412ed58932..fea0d33b04f7 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -36,6 +36,7 @@
#include <asm/alternative.h>
#include <asm/insn.h>
#include <asm/debugreg.h>
+#include <asm/nospec-branch.h>

#include "common.h"

@@ -191,7 +192,7 @@ static int copy_optimized_instructions(u8 *dest, u8 *src)
}

/* Check whether insn is indirect jump */
-static int insn_is_indirect_jump(struct insn *insn)
+static int __insn_is_indirect_jump(struct insn *insn)
{
return ((insn->opcode.bytes[0] == 0xff &&
(X86_MODRM_REG(insn->modrm.value) & 6) == 4) || /* Jump */
@@ -225,6 +226,26 @@ static int insn_jump_into_range(struct insn *insn, unsigned long start, int len)
return (start <= target && target <= start + len);
}

+static int insn_is_indirect_jump(struct insn *insn)
+{
+ int ret = __insn_is_indirect_jump(insn);
+
+#ifdef CONFIG_RETPOLINE
+ /*
+ * Jump to x86_indirect_thunk_* is treated as an indirect jump.
+ * Note that even with CONFIG_RETPOLINE=y, the kernel compiled with
+ * older gcc may use indirect jump. So we add this check instead of
+ * replace indirect-jump check.
+ */
+ if (!ret)
+ ret = insn_jump_into_range(insn,
+ (unsigned long)__indirect_thunk_start,
+ (unsigned long)__indirect_thunk_end -
+ (unsigned long)__indirect_thunk_start);
+#endif
+ return ret;
+}
+
/* Decode whole function to ensure any instructions don't jump into target */
static int can_optimize(unsigned long paddr)
{
diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S
index 0b15b5b3957c..78d9497fcff9 100644
--- a/arch/x86/kernel/mcount_64.S
+++ b/arch/x86/kernel/mcount_64.S
@@ -7,7 +7,7 @@
#include <linux/linkage.h>
#include <asm/ptrace.h>
#include <asm/ftrace.h>
-
+#include <asm/nospec-branch.h>

.code64
.section .entry.text, "ax"
@@ -210,8 +210,8 @@ GLOBAL(ftrace_stub)
#endif
subq $MCOUNT_INSN_SIZE, %rdi

- call *ftrace_trace_function
-
+ movq ftrace_trace_function, %r8
+ CALL_NOSPEC %r8
MCOUNT_RESTORE_FRAME

jmp ftrace_stub
@@ -254,5 +254,5 @@ GLOBAL(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
- jmp *%rdi
+ JMP_NOSPEC %rdi
#endif
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 49edf2dd3613..565e7dd0fe2e 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -104,6 +104,12 @@ SECTIONS
IRQENTRY_TEXT
*(.fixup)
*(.gnu.warning)
+#ifdef CONFIG_RETPOLINE
+ __indirect_thunk_start = .;
+ *(.text.__x86.indirect_thunk)
+ __indirect_thunk_end = .;
+#endif
+
/* End of text section */
_etext = .;
} :text = 0x9090
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3ef118df2547..98c567c98497 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -25,6 +25,7 @@
#include <linux/module.h>
#include <asm/kvm_emulate.h>
#include <linux/stringify.h>
+#include <asm/nospec-branch.h>

#include "x86.h"
#include "tss.h"
@@ -906,8 +907,8 @@ static u8 test_cc(unsigned int condition, unsigned long flags)
void (*fop)(void) = (void *)em_setcc + 4 * (condition & 0xf);

flags = (flags & EFLAGS_MASK) | X86_EFLAGS_IF;
- asm("push %[flags]; popf; call *%[fastop]"
- : "=a"(rc) : [fastop]"r"(fop), [flags]"r"(flags));
+ asm("push %[flags]; popf; " CALL_NOSPEC
+ : "=a"(rc) : [thunk_target]"r"(fop), [flags]"r"(flags));
return rc;
}

@@ -4622,9 +4623,9 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
if (!(ctxt->d & ByteOp))
fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
- asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
+ asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
: "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
- [fastop]"+S"(fop)
+ [thunk_target]"+S"(fop), ASM_CALL_CONSTRAINT
: "c"(ctxt->src2.val));
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
if (!fop) /* exception is returned in fop variable */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 962a5d37756d..80207eb90102 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -36,6 +36,7 @@
#include <asm/desc.h>
#include <asm/debugreg.h>
#include <asm/kvm_para.h>
+#include <asm/nospec-branch.h>

#include <asm/virtext.h>
#include "trace.h"
@@ -3914,6 +3915,25 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
"mov %%r13, %c[r13](%[svm]) \n\t"
"mov %%r14, %c[r14](%[svm]) \n\t"
"mov %%r15, %c[r15](%[svm]) \n\t"
+#endif
+ /*
+ * Clear host registers marked as clobbered to prevent
+ * speculative use.
+ */
+ "xor %%" _ASM_BX ", %%" _ASM_BX " \n\t"
+ "xor %%" _ASM_CX ", %%" _ASM_CX " \n\t"
+ "xor %%" _ASM_DX ", %%" _ASM_DX " \n\t"
+ "xor %%" _ASM_SI ", %%" _ASM_SI " \n\t"
+ "xor %%" _ASM_DI ", %%" _ASM_DI " \n\t"
+#ifdef CONFIG_X86_64
+ "xor %%r8, %%r8 \n\t"
+ "xor %%r9, %%r9 \n\t"
+ "xor %%r10, %%r10 \n\t"
+ "xor %%r11, %%r11 \n\t"
+ "xor %%r12, %%r12 \n\t"
+ "xor %%r13, %%r13 \n\t"
+ "xor %%r14, %%r14 \n\t"
+ "xor %%r15, %%r15 \n\t"
#endif
"pop %%" _ASM_BP
:
@@ -3944,6 +3964,9 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);

+ /* Eliminate branch target predictions from guest mode */
+ vmexit_fill_RSB();
+
#ifdef CONFIG_X86_64
wrmsrl(MSR_GS_BASE, svm->host.gs_base);
#else
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 257b37b5ddc1..300ca8d07d9c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -32,6 +32,7 @@
#include <linux/slab.h>
#include <linux/tboot.h>
#include <linux/hrtimer.h>
+#include <linux/nospec.h>
#include "kvm_cache_regs.h"
#include "x86.h"

@@ -45,6 +46,7 @@
#include <asm/perf_event.h>
#include <asm/debugreg.h>
#include <asm/kexec.h>
+#include <asm/nospec-branch.h>

#include "trace.h"

@@ -694,23 +696,21 @@ static const unsigned short vmcs_field_to_offset_table[] = {
FIELD(HOST_RSP, host_rsp),
FIELD(HOST_RIP, host_rip),
};
-static const int max_vmcs_field = ARRAY_SIZE(vmcs_field_to_offset_table);

static inline short vmcs_field_to_offset(unsigned long field)
{
- if (field >= max_vmcs_field)
- return -1;
-
- /*
- * FIXME: Mitigation for CVE-2017-5753. To be replaced with a
- * generic mechanism.
- */
- asm("lfence");
+ const size_t size = ARRAY_SIZE(vmcs_field_to_offset_table);
+ unsigned short offset;

- if (vmcs_field_to_offset_table[field] == 0)
+ BUILD_BUG_ON(size > SHRT_MAX);
+ if (field >= size)
return -1;

- return vmcs_field_to_offset_table[field];
+ field = array_index_nospec(field, size);
+ offset = vmcs_field_to_offset_table[field];
+ if (offset == 0)
+ return -1;
+ return offset;
}

static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
@@ -7247,13 +7247,14 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
"pushf\n\t"
"orl $0x200, (%%" _ASM_SP ")\n\t"
__ASM_SIZE(push) " $%c[cs]\n\t"
- "call *%[entry]\n\t"
+ CALL_NOSPEC
:
#ifdef CONFIG_X86_64
- [sp]"=&r"(tmp)
+ [sp]"=&r"(tmp),
#endif
+ ASM_CALL_CONSTRAINT
:
- [entry]"r"(entry),
+ THUNK_TARGET(entry),
[ss]"i"(__KERNEL_DS),
[cs]"i"(__KERNEL_CS)
);
@@ -7487,6 +7488,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
/* Save guest registers, load host registers, keep flags */
"mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"
"pop %0 \n\t"
+ "setbe %c[fail](%0)\n\t"
"mov %%" _ASM_AX ", %c[rax](%0) \n\t"
"mov %%" _ASM_BX ", %c[rbx](%0) \n\t"
__ASM_SIZE(pop) " %c[rcx](%0) \n\t"
@@ -7503,12 +7505,23 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
"mov %%r13, %c[r13](%0) \n\t"
"mov %%r14, %c[r14](%0) \n\t"
"mov %%r15, %c[r15](%0) \n\t"
+ "xor %%r8d, %%r8d \n\t"
+ "xor %%r9d, %%r9d \n\t"
+ "xor %%r10d, %%r10d \n\t"
+ "xor %%r11d, %%r11d \n\t"
+ "xor %%r12d, %%r12d \n\t"
+ "xor %%r13d, %%r13d \n\t"
+ "xor %%r14d, %%r14d \n\t"
+ "xor %%r15d, %%r15d \n\t"
#endif
"mov %%cr2, %%" _ASM_AX " \n\t"
"mov %%" _ASM_AX ", %c[cr2](%0) \n\t"

+ "xor %%eax, %%eax \n\t"
+ "xor %%ebx, %%ebx \n\t"
+ "xor %%esi, %%esi \n\t"
+ "xor %%edi, %%edi \n\t"
"pop %%" _ASM_BP "; pop %%" _ASM_DX " \n\t"
- "setbe %c[fail](%0) \n\t"
".pushsection .rodata \n\t"
".global vmx_return \n\t"
"vmx_return: " _ASM_PTR " 2b \n\t"
@@ -7545,6 +7558,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);

+ /* Eliminate branch target predictions from guest mode */
+ vmexit_fill_RSB();
+
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
if (debugctlmsr)
update_debugctlmsr(debugctlmsr);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 4d4f96a27638..7f120e68480e 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,8 @@ lib-y += memcpy_$(BITS).o
lib-$(CONFIG_SMP) += rwlock.o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_RETPOLINE) += retpoline.o
+obj-$(CONFIG_RETPOLINE) += retpoline-export.o

obj-y += msr.o msr-reg.o msr-reg-export.o hash.o

diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index e78b8eee6615..94305f8cf31a 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -29,7 +29,8 @@
#include <asm/dwarf2.h>
#include <asm/errno.h>
#include <asm/asm.h>
-
+#include <asm/nospec-branch.h>
+
/*
* computes a partial checksum, e.g. for TCP/UDP fragments
*/
@@ -165,7 +166,7 @@ ENTRY(csum_partial)
negl %ebx
lea 45f(%ebx,%ebx,2), %ebx
testl %esi, %esi
- jmp *%ebx
+ JMP_NOSPEC %ebx

# Handle 2-byte-aligned regions
20: addw (%esi), %ax
@@ -463,7 +464,7 @@ ENTRY(csum_partial_copy_generic)
andl $-32,%edx
lea 3f(%ebx,%ebx), %ebx
testl %esi, %esi
- jmp *%ebx
+ JMP_NOSPEC %ebx
1: addl $64,%esi
addl $64,%edi
SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index a4512359656a..3917307fca99 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,8 @@ ENTRY(__get_user_1)
GET_THREAD_INFO(%_ASM_DX)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
1: movzbl (%_ASM_AX),%edx
xor %eax,%eax
@@ -55,6 +57,8 @@ ENTRY(__get_user_2)
GET_THREAD_INFO(%_ASM_DX)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
2: movzwl -1(%_ASM_AX),%edx
xor %eax,%eax
@@ -70,6 +74,8 @@ ENTRY(__get_user_4)
GET_THREAD_INFO(%_ASM_DX)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
3: movl -3(%_ASM_AX),%edx
xor %eax,%eax
@@ -86,6 +92,8 @@ ENTRY(__get_user_8)
GET_THREAD_INFO(%_ASM_DX)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
4: movq -7(%_ASM_AX),%rdx
xor %eax,%eax
@@ -97,6 +105,8 @@ ENTRY(__get_user_8)
GET_THREAD_INFO(%_ASM_DX)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user_8
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
4: movl -7(%_ASM_AX),%edx
5: movl -3(%_ASM_AX),%ecx
diff --git a/arch/x86/lib/retpoline-export.c b/arch/x86/lib/retpoline-export.c
new file mode 100644
index 000000000000..3591683cb59c
--- /dev/null
+++ b/arch/x86/lib/retpoline-export.c
@@ -0,0 +1,24 @@
+#include <linux/linkage.h>
+
+#ifdef CONFIG_RETPOLINE
+#ifdef CONFIG_X86_32
+#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void); EXPORT_SYMBOL(__x86_indirect_thunk_e ## reg);
+#else
+#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void); EXPORT_SYMBOL(__x86_indirect_thunk_r ## reg);
+INDIRECT_THUNK(8)
+INDIRECT_THUNK(9)
+INDIRECT_THUNK(10)
+INDIRECT_THUNK(11)
+INDIRECT_THUNK(12)
+INDIRECT_THUNK(13)
+INDIRECT_THUNK(14)
+INDIRECT_THUNK(15)
+#endif
+INDIRECT_THUNK(ax)
+INDIRECT_THUNK(bx)
+INDIRECT_THUNK(cx)
+INDIRECT_THUNK(dx)
+INDIRECT_THUNK(si)
+INDIRECT_THUNK(di)
+INDIRECT_THUNK(bp)
+#endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
new file mode 100644
index 000000000000..e19f5476e218
--- /dev/null
+++ b/arch/x86/lib/retpoline.S
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/stringify.h>
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative-asm.h>
+#include <asm/nospec-branch.h>
+
+.macro THUNK reg
+ .section .text.__x86.indirect_thunk
+
+ENTRY(__x86_indirect_thunk_\reg)
+ CFI_STARTPROC
+ JMP_NOSPEC %\reg
+ CFI_ENDPROC
+ENDPROC(__x86_indirect_thunk_\reg)
+.endm
+
+/*
+ * Despite being an assembler file we can't just use .irp here
+ * because __KSYM_DEPS__ only uses the C preprocessor and would
+ * only see one instance of "__x86_indirect_thunk_\reg" rather
+ * than one per register with the correct names. So we do it
+ * the simple and nasty way...
+ */
+#define __EXPORT_THUNK(sym) _ASM_NOKPROBE(sym)
+#define EXPORT_THUNK(reg) __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
+#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
+
+GENERATE_THUNK(_ASM_AX)
+GENERATE_THUNK(_ASM_BX)
+GENERATE_THUNK(_ASM_CX)
+GENERATE_THUNK(_ASM_DX)
+GENERATE_THUNK(_ASM_SI)
+GENERATE_THUNK(_ASM_DI)
+GENERATE_THUNK(_ASM_BP)
+#ifdef CONFIG_64BIT
+GENERATE_THUNK(r8)
+GENERATE_THUNK(r9)
+GENERATE_THUNK(r10)
+GENERATE_THUNK(r11)
+GENERATE_THUNK(r12)
+GENERATE_THUNK(r13)
+GENERATE_THUNK(r14)
+GENERATE_THUNK(r15)
+#endif
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e2f5e21c03b3..de848bc95a9f 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -570,12 +570,12 @@ do { \
unsigned long __copy_to_user_ll(void __user *to, const void *from,
unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
if (movsl_is_ok(to, from, n))
__copy_user(to, from, n);
else
n = __copy_user_intel(to, from, n);
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_to_user_ll);
@@ -583,12 +583,12 @@ EXPORT_SYMBOL(__copy_to_user_ll);
unsigned long __copy_from_user_ll(void *to, const void __user *from,
unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
if (movsl_is_ok(to, from, n))
__copy_user_zeroing(to, from, n);
else
n = __copy_user_zeroing_intel(to, from, n);
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_from_user_ll);
@@ -596,13 +596,13 @@ EXPORT_SYMBOL(__copy_from_user_ll);
unsigned long __copy_from_user_ll_nozero(void *to, const void __user *from,
unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
if (movsl_is_ok(to, from, n))
__copy_user(to, from, n);
else
n = __copy_user_intel((void __user *)to,
(const void *)from, n);
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_from_user_ll_nozero);
@@ -610,7 +610,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nozero);
unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from,
unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
#ifdef CONFIG_X86_INTEL_USERCOPY
if (n > 64 && cpu_has_xmm2)
n = __copy_user_zeroing_intel_nocache(to, from, n);
@@ -619,7 +619,7 @@ unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from,
#else
__copy_user_zeroing(to, from, n);
#endif
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_from_user_ll_nocache);
@@ -627,7 +627,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocache);
unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
#ifdef CONFIG_X86_INTEL_USERCOPY
if (n > 64 && cpu_has_xmm2)
n = __copy_user_intel_nocache(to, from, n);
@@ -636,7 +636,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
#else
__copy_user(to, from, n);
#endif
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 23b8726962af..209eea947568 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -193,6 +193,9 @@ config GENERIC_CPU_DEVICES
config GENERIC_CPU_AUTOPROBE
bool

+config GENERIC_CPU_VULNERABILITIES
+ bool
+
config SOC_BUS
bool

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc5297d..9175c84161ec 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -418,10 +418,58 @@ static void __init cpu_dev_register_generic(void)
#endif
}

+#ifdef CONFIG_GENERIC_CPU_VULNERABILITIES
+
+ssize_t __weak cpu_show_meltdown(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "Not affected\n");
+}
+
+ssize_t __weak cpu_show_spectre_v1(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "Not affected\n");
+}
+
+ssize_t __weak cpu_show_spectre_v2(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "Not affected\n");
+}
+
+static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
+static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
+static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
+
+static struct attribute *cpu_root_vulnerabilities_attrs[] = {
+ &dev_attr_meltdown.attr,
+ &dev_attr_spectre_v1.attr,
+ &dev_attr_spectre_v2.attr,
+ NULL
+};
+
+static const struct attribute_group cpu_root_vulnerabilities_group = {
+ .name = "vulnerabilities",
+ .attrs = cpu_root_vulnerabilities_attrs,
+};
+
+static void __init cpu_register_vulnerabilities(void)
+{
+ if (sysfs_create_group(&cpu_subsys.dev_root->kobj,
+ &cpu_root_vulnerabilities_group))
+ pr_err("Unable to register CPU vulnerabilities\n");
+}
+
+#else
+static inline void cpu_register_vulnerabilities(void) { }
+#endif
+
void __init cpu_dev_init(void)
{
if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
panic("Failed to register CPU subsystem");

cpu_dev_register_generic();
+ cpu_register_vulnerabilities();
}
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index a8b9eea5c4fc..47979d05031c 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -29,6 +29,7 @@
#include <linux/version.h>
#include <linux/interrupt.h>
#include <asm/hyperv.h>
+#include <asm/nospec-branch.h>
#include "hyperv_vmbus.h"

/* The one and only */
@@ -93,10 +94,13 @@ static u64 do_hypercall(u64 control, void *input, void *output)
u64 output_address = (output) ? virt_to_phys(output) : 0;
void *hypercall_page = hv_context.hypercall_page;

- __asm__ __volatile__("mov %0, %%r8" : : "r" (output_address) : "r8");
- __asm__ __volatile__("call *%3" : "=a" (hv_status) :
- "c" (control), "d" (input_address),
- "m" (hypercall_page));
+ __asm__ __volatile__("mov %4, %%r8\n"
+ CALL_NOSPEC
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input_address)
+ : "r" (output_address),
+ THUNK_TARGET(hypercall_page)
+ : "cc", "memory", "r8", "r9", "r10", "r11");

return hv_status;

@@ -114,11 +118,14 @@ static u64 do_hypercall(u64 control, void *input, void *output)
u32 output_address_lo = output_address & 0xFFFFFFFF;
void *hypercall_page = hv_context.hypercall_page;

- __asm__ __volatile__ ("call *%8" : "=d"(hv_status_hi),
- "=a"(hv_status_lo) : "d" (control_hi),
- "a" (control_lo), "b" (input_address_hi),
- "c" (input_address_lo), "D"(output_address_hi),
- "S"(output_address_lo), "m" (hypercall_page));
+ __asm__ __volatile__(CALL_NOSPEC
+ : "=d" (hv_status_hi), "=a" (hv_status_lo),
+ "+c" (input_address_lo), ASM_CALL_CONSTRAINT
+ : "d" (control_hi), "a" (control_lo),
+ "b" (input_address_hi),
+ "D"(output_address_hi), "S"(output_address_lo),
+ THUNK_TARGET(hypercall_page)
+ : "cc", "memory");

return hv_status_lo | ((u64)hv_status_hi << 32);
#endif /* !x86_64 */
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 3b73e762b2f5..0441fa7c20b1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -39,6 +39,13 @@ extern void cpu_remove_dev_attr(struct device_attribute *attr);
extern int cpu_add_dev_attr_group(struct attribute_group *attrs);
extern void cpu_remove_dev_attr_group(struct attribute_group *attrs);

+extern ssize_t cpu_show_meltdown(struct device *dev,
+ struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_spectre_v1(struct device *dev,
+ struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_spectre_v2(struct device *dev,
+ struct device_attribute *attr, char *buf);
+
#ifdef CONFIG_HOTPLUG_CPU
extern void unregister_cpu(struct cpu *cpu);
extern ssize_t arch_cpu_probe(const char *, size_t);
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 230f87bdf5ad..2c084871e833 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -9,6 +9,7 @@
#include <linux/compiler.h>
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
+#include <linux/nospec.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/fs.h>
@@ -76,8 +77,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
{
struct fdtable *fdt = rcu_dereference_raw(files->fdt);

- if (fd < fdt->max_fds)
+ if (fd < fdt->max_fds) {
+ fd = array_index_nospec(fd, fdt->max_fds);
return rcu_dereference_raw(fdt->fd[fd]);
+ }
return NULL;
}

diff --git a/include/linux/init.h b/include/linux/init.h
index 2df8e8dd10a4..487febe50917 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -4,6 +4,13 @@
#include <linux/compiler.h>
#include <linux/types.h>

+/* Built-in __init functions needn't be compiled with retpoline */
+#if defined(RETPOLINE) && !defined(MODULE)
+#define __noretpoline __attribute__((indirect_branch("keep")))
+#else
+#define __noretpoline
+#endif
+
/* These macros are used to mark some functions or
* initialized data (doesn't apply to uninitialized data)
* as `initialization' functions. The kernel can take this
@@ -39,7 +46,7 @@

/* These are for everybody (although not all archs will actually
discard it in modules) */
-#define __init __section(.init.text) __cold notrace
+#define __init __section(.init.text) __cold notrace __noretpoline
#define __initdata __section(.init.data)
#define __initconst __constsection(.init.rodata)
#define __exitdata __section(.exit.data)
diff --git a/include/linux/kaiser.h b/include/linux/kaiser.h
index 58c55b1589d0..b56c19010480 100644
--- a/include/linux/kaiser.h
+++ b/include/linux/kaiser.h
@@ -32,7 +32,7 @@ static inline void kaiser_init(void)
{
}
static inline int kaiser_add_mapping(unsigned long addr,
- unsigned long size, unsigned long flags)
+ unsigned long size, u64 flags)
{
return 0;
}
diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h
index be342b94c640..4143edbce275 100644
--- a/include/linux/kconfig.h
+++ b/include/linux/kconfig.h
@@ -17,10 +17,11 @@
* the last step cherry picks the 2nd arg, we get a zero.
*/
#define __ARG_PLACEHOLDER_1 0,
-#define config_enabled(cfg) _config_enabled(cfg)
-#define _config_enabled(value) __config_enabled(__ARG_PLACEHOLDER_##value)
-#define __config_enabled(arg1_or_junk) ___config_enabled(arg1_or_junk 1, 0)
-#define ___config_enabled(__ignored, val, ...) val
+#define config_enabled(cfg) ___is_defined(cfg)
+#define __is_defined(x) ___is_defined(x)
+#define ___is_defined(val) ____is_defined(__ARG_PLACEHOLDER_##val)
+#define ____is_defined(arg1_or_junk) __take_second_arg(arg1_or_junk 1, 0)
+#define __take_second_arg(__ignored, val, ...) val

/*
* IS_ENABLED(CONFIG_FOO) evaluates to 1 if CONFIG_FOO is set to 'y' or 'm',
diff --git a/include/linux/module.h b/include/linux/module.h
index ac1e65afc5a1..e18bd2ff1346 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -656,4 +656,13 @@ static inline void module_bug_finalize(const Elf_Ehdr *hdr,
static inline void module_bug_cleanup(struct module *mod) {}
#endif /* CONFIG_GENERIC_BUG */

+#ifdef RETPOLINE
+extern bool retpoline_module_ok(bool has_retpoline);
+#else
+static inline bool retpoline_module_ok(bool has_retpoline)
+{
+ return true;
+}
+#endif
+
#endif /* _LINUX_MODULE_H */
diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..3b1d69c52beb
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Linus Torvalds. All rights reserved.
+// Copyright(c) 2018 Alexei Starovoitov. All rights reserved.
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef _LINUX_NOSPEC_H
+#define _LINUX_NOSPEC_H
+#include <asm/barrier.h>
+
+/**
+ * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise
+ * @index: array element index
+ * @size: number of elements in array
+ *
+ * When @index is out of bounds (@index >= @size), the sign bit will be
+ * set. Extend the sign bit to all bits and invert, giving a result of
+ * zero for an out of bounds index, or ~0 if within bounds [0, @size).
+ */
+#ifndef array_index_mask_nospec
+static inline unsigned long array_index_mask_nospec(unsigned long index,
+ unsigned long size)
+{
+ /*
+ * Always calculate and emit the mask even if the compiler
+ * thinks the mask is not needed. The compiler does not take
+ * into account the value of @index under speculation.
+ */
+ OPTIMIZER_HIDE_VAR(index);
+ return ~(long)(index | (size - 1UL - index)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/*
+ * array_index_nospec - sanitize an array index after a bounds check
+ *
+ * For a code sequence like:
+ *
+ * if (index < size) {
+ * index = array_index_nospec(index, size);
+ * val = array[index];
+ * }
+ *
+ * ...if the CPU speculates past the bounds check then
+ * array_index_nospec() will clamp the index within the range of [0,
+ * size).
+ */
+#define array_index_nospec(index, size) \
+({ \
+ typeof(index) _i = (index); \
+ typeof(size) _s = (size); \
+ unsigned long _mask = array_index_mask_nospec(_i, _s); \
+ \
+ BUILD_BUG_ON(sizeof(_i) > sizeof(long)); \
+ BUILD_BUG_ON(sizeof(_s) > sizeof(long)); \
+ \
+ _i &= _mask; \
+ _i; \
+})
+#endif /* _LINUX_NOSPEC_H */
diff --git a/kernel/module.c b/kernel/module.c
index 8c3baf05f7bd..186c2648c667 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2494,6 +2494,15 @@ static int elf_header_check(struct load_info *info)
return 0;
}

+static void check_modinfo_retpoline(struct module *mod, struct load_info *info)
+{
+ if (retpoline_module_ok(get_modinfo(info, "retpoline")))
+ return;
+
+ pr_warn("%s: loading module not compiled with retpoline compiler.\n",
+ mod->name);
+}
+
/* Sets info->hdr and info->len. */
static int copy_module_from_user(const void __user *umod, unsigned long len,
struct load_info *info)
@@ -2698,6 +2707,8 @@ static int check_modinfo(struct module *mod, struct load_info *info, int flags)
if (!get_modinfo(info, "intree"))
add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);

+ check_modinfo_retpoline(mod, info);
+
if (get_modinfo(info, "staging")) {
add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
pr_warn("%s: module is from the staging directory, the quality "
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 81addba0aa34..591d7772e99b 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -14,6 +14,7 @@
#include <linux/nl80211.h>
#include <linux/rtnetlink.h>
#include <linux/netlink.h>
+#include <linux/nospec.h>
#include <linux/etherdevice.h>
#include <net/net_namespace.h>
#include <net/genetlink.h>
@@ -1829,20 +1830,22 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
static int parse_txq_params(struct nlattr *tb[],
struct ieee80211_txq_params *txq_params)
{
+ u8 ac;
+
if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
!tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
!tb[NL80211_TXQ_ATTR_AIFS])
return -EINVAL;

- txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+ ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);

- if (txq_params->ac >= NL80211_NUM_ACS)
+ if (ac >= NL80211_NUM_ACS)
return -EINVAL;
-
+ txq_params->ac = array_index_nospec(ac, NL80211_NUM_ACS);
return 0;
}

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 2241d036b63a..f86b4f77cd76 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1952,6 +1952,14 @@ static void add_intree_flag(struct buffer *b, int is_intree)
buf_printf(b, "\nMODULE_INFO(intree, \"Y\");\n");
}

+/* Cannot check for assembler */
+static void add_retpoline(struct buffer *b)
+{
+ buf_printf(b, "\n#ifdef RETPOLINE\n");
+ buf_printf(b, "MODULE_INFO(retpoline, \"Y\");\n");
+ buf_printf(b, "#endif\n");
+}
+
static void add_staging_flag(struct buffer *b, const char *name)
{
static const char *staging_dir = "drivers/staging";
@@ -2282,6 +2290,7 @@ int main(int argc, char **argv)

add_header(&buf, mod);
add_intree_flag(&buf, !external_module);
+ add_retpoline(&buf);
add_staging_flag(&buf, mod->name);
err |= add_versions(&buf, mod);
add_depends(&buf, mod, modules);

Attachment: signature.asc
Description: Digital signature