Re: [PATCH 4/4] x86: Pass kernel thread parameters in fork_frame

From: Andy Lutomirski
Date: Mon May 23 2016 - 11:37:05 EST


On Mon, May 23, 2016 at 8:23 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Sat, May 21, 2016 at 12:04:51PM -0400, Brian Gerst wrote:
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -405,37 +405,29 @@ END(__switch_to_asm)
>> * A newly forked process directly context switches into this address.
>> *
>> * rax: prev task we switched from
>> + * rbx: kernel thread func
>> + * r12: kernel thread arg
>> */
>> ENTRY(ret_from_fork)
>> movq %rax, %rdi
>> call schedule_tail /* rdi: 'prev' task parameter */
>>
>> - testb $3, CS(%rsp) /* from kernel_thread? */
>> + testq %rbx, %rbx /* from kernel_thread? */
>> jnz 1f
>>
>> - /*
>> - * We came from kernel_thread. This code path is quite twisted, and
>> - * someone should clean it up.
>> - *
>> - * copy_thread_tls stashes the function pointer in RBX and the
>> - * parameter to be passed in RBP. The called function is permitted
>> - * to call do_execve and thereby jump to user mode.
>> - */
>> - movq RBP(%rsp), %rdi
>> - call *RBX(%rsp)
>> - movq %rax, RAX(%rsp)
>> -
>> - /*
>> - * Fall through as though we're exiting a syscall. This makes a
>> - * twisted sort of sense if we just called do_execve.
>> - */
>> -
>> -1:
>> +2:
>> movq %rsp, %rdi
>> call syscall_return_slowpath /* returns with IRQs disabled */
>> TRACE_IRQS_ON /* user mode is traced as IRQS on */
>> SWAPGS
>> jmp restore_regs_and_iret
>> +
>> +1:
>> + /* kernel thread */
>> + movq %r12, %rdi
>> + call *%rbx
>> + movq %rax, RAX(%rsp)
>> + jmp 2b
>> END(ret_from_fork)
>
> It seems really surprising that a kernel thread would be returning to
> user space. It would probably be a good idea to preserve the existing
> comments about that.
>

Agreed.

Which reminds me: at some point, on top of this series, we should
consider either having multiple variants of ret_from_fork or otherwise
generalizing the code. If and when we implement CPL3 for *kernel*
code (SGX and UEFI come to mind as possible use cases), we probably
won't want to go through syscall_return_slowpath.