Re: [tip:x86/asm] x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32

From: Denys Vlasenko
Date: Mon Mar 09 2015 - 09:05:16 EST

On Sat, Mar 7, 2015 at 9:37 AM, tip-bot for Andy Lutomirski
<tipbot@xxxxxxxxx> wrote:
> Commit-ID: a7fcf28d431ef70afaa91496e64e16dc51dccec4
> Gitweb:
> Author: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> AuthorDate: Fri, 6 Mar 2015 17:50:19 -0800
> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> CommitDate: Sat, 7 Mar 2015 09:34:03 +0100
> x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32
> I broke 32-bit kernels. The implementation of sp0 was correct
> as far as I can tell, but sp0 was much weirder on x86_32 than I
> realized. It has the following issues:
> - Init's sp0 is inconsistent with everything else's: non-init tasks
> are offset by 8 bytes. (I have no idea why, and the comment is unhelpful.)
> - vm86 does crazy things to sp0.
> Fix it up by replacing this_cpu_sp0() with
> current_top_of_stack() and using a new percpu variable to track
> the top of the stack on x86_32.

Looks like the hope that tss.sp0 is a reliable variable
which points to top of stack didn't really play out :(

Recent relevant commits in x86/entry were:

x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu
- added accessor to tss.sp0
"We currently store references to the top of the kernel stack in
multiple places: kernel_stack (with an offset) and
init_tss.x86_tss.sp0 (no offset). The latter is defined by
hardware and is a clean canonical way to find the top of the
stack. Add an accessor so we can start using it."

x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()
- equivalent change, no win/no loss

x86/asm/entry/64/compat: Change the 32-bit sysenter code to use sp0
- Even though it did remove one insn, we can get the same
if KERNEL_STACK_OFFSET will be eliminated

x86: Delay loading sp0 slightly on task switch
- simple fix, nothing needed to be added

x86: Replace this_cpu_sp0 with current_top_of_stack and fix it on x86_32
- added a percpu var cpu_current_top_of_stack
- needs to set it in do_boot_cpu()
- added ifdef forest:
+#ifdef CONFIG_X86_64
return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
+ /* sp0 on x86_32 is special in and around vm86 mode. */
+ return this_cpu_read_stable(cpu_current_top_of_stack);

End result is, now 32-bit kernel has two per-cpu vartiables,
cpu_current_top_of_stack and kernel_stack.

cpu_current_top_of_stack is essentially "real top of stack",
and kernel_stack is "real top of stack - KERNEL_STACK_OFFSET".

When/if we get rid of KERNEL_STACK_OFFSET,
we can also get rid of kernel_stack, since it will be the same as
cpu_current_top_of_stack (which is a better name anyway).
