Re: [PATCH 3/4] x86: Rewrite switch_to() code

From: Brian Gerst
Date: Sun May 22 2016 - 15:31:23 EST


On Sun, May 22, 2016 at 1:59 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> cc: Josh Poimboeuf: do you care about the exact stack layout of the
> bottom of the stack of an inactive task?
>
> On May 21, 2016 9:05 AM, "Brian Gerst" <brgerst@xxxxxxxxx> wrote:
>>
>> Move the low-level context switch code to an out-of-line asm stub instead of
>> using complex inline asm. This allows constructing a new stack frame for the
>> child process to make it seamlessly flow to ret_from_fork without an extra
>> test and branch in __switch_to(). It also improves code generation for
>> __schedule() by using the C calling convention instead of clobbering all
>> registers.
>
> I like the concept a lot.
>
>>
>> Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>
>> ---
>> arch/x86/entry/entry_32.S | 38 ++++++++++
>> arch/x86/entry/entry_64.S | 42 +++++++++++-
>> arch/x86/include/asm/processor.h | 3 -
>> arch/x86/include/asm/switch_to.h | 137 ++++++-------------------------------
>> arch/x86/include/asm/thread_info.h | 2 -
>> arch/x86/kernel/asm-offsets.c | 6 ++
>> arch/x86/kernel/asm-offsets_32.c | 5 ++
>> arch/x86/kernel/asm-offsets_64.c | 5 ++
>> arch/x86/kernel/process_32.c | 8 ++-
>> arch/x86/kernel/process_64.c | 7 +-
>> arch/x86/kernel/smpboot.c | 1 -
>> 11 files changed, 124 insertions(+), 130 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>> index ee6fea0..05e5340 100644
>> --- a/arch/x86/entry/entry_32.S
>> +++ b/arch/x86/entry/entry_32.S
>> @@ -204,6 +204,44 @@
>> POP_GS_EX
>> .endm
>>
>> +/*
>> + * %eax: prev task
>> + * %edx: next task
>> + */
>> +ENTRY(__switch_to_asm)
>> + /*
>> + * Save callee-saved registers
>> + * This must match the order in struct fork_frame
>> + * Frame pointer must be last for get_wchan
>> + */
>> + pushl %ebx
>> + pushl %edi
>> + pushl %esi
>> + pushl %ebp
>> +
>> + /* switch stack */
>> + movl %esp, TASK_threadsp(%eax)
>> + movl TASK_threadsp(%edx), %esp
>> +
>> +#ifdef CONFIG_CC_STACKPROTECTOR
>> + movl TASK_stack_canary(%edx), %ebx
>> + movl %ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
>> +#endif
>> +
>> + /* restore callee-saved registers */
>> + popl %ebp
>> + popl %esi
>> + popl %edi
>> + popl %ebx
>
> This is highly, highly magical. eax and edx are prev and next, and:

What is so magical about the standard C calling convention (regarm(3)
in the 32-bit case)? This just passes them right though to
__switch_to().

>> +
>> + jmp __switch_to
>
> leaves prev in eax. This works, but it might be worth a comment.

Not quite, __switch_to() returns 'last', not 'prev'. The previous
task when this is called is not the same task when the thread wakes
up.

>> +END(__switch_to_asm)
>
>> /*
>> + * %rdi: prev task
>> + * %rsi: next task
>> + */
>> +ENTRY(__switch_to_asm)
>> + /*
>> + * Save callee-saved registers
>> + * This must match the order in struct fork_frame
>> + * Frame pointer must be last for get_wchan
>> + */
>> + pushq %rbx
>> + pushq %r12
>> + pushq %r13
>> + pushq %r14
>> + pushq %r15
>> + pushq %rbp
>> +
>> + /* switch stack */
>> + movq %rsp, TASK_threadsp(%rdi)
>> + movq TASK_threadsp(%rsi), %rsp
>> +
>> +#ifdef CONFIG_CC_STACKPROTECTOR
>> + movq TASK_stack_canary(%rsi), %rbx
>> + movq %rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
>> +#endif
>> +
>> + /* restore callee-saved registers */
>> + popq %rbp
>> + popq %r15
>> + popq %r14
>> + popq %r13
>> + popq %r12
>> + popq %rbx
>> +
>> + jmp __switch_to
>
> Ditto with the magic here.
>
>> +struct fork_frame {
>> + unsigned long bp;
>> +#ifdef CONFIG_X86_64
>> + unsigned long r15;
>> + unsigned long r14;
>> + unsigned long r13;
>> + unsigned long r12;
>> +#else
>> + unsigned long si;
>> + unsigned long di;
>> +#endif
>> + unsigned long bx;
>> + unsigned long ret_addr;
>> + struct pt_regs regs;
>> +};
>
> This, like the old implementation, is very much geared to the current
> implementation of fork. Can you split it up:
>
> struct inactive_task_frame {
> unsigned long bp;
> ...
> unsigned long ret_addr;
> };
>
> /* fork works by setting up the child stack so that switch_to will
> land at ret_from_fork with sp pointing at pt_regs */
> struct fork_frame {
> struct inactive_task_frame switch_frame;
> struct pt_regs regs;
> };
>
> Then, if and when someone wants to fork into a different type of
> context, they can reuse this. Also, a future improved unwinder can
> use inactive_task_frame directly to kick off its unwind.

Sounds reasonable.

--
Brian Gerst