Re: [PATCH 4/4] x86: Pass kernel thread parameters in fork_frame

From: Brian Gerst
Date: Mon May 23 2016 - 17:04:42 EST


On Mon, May 23, 2016 at 11:36 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Mon, May 23, 2016 at 8:23 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> On Sat, May 21, 2016 at 12:04:51PM -0400, Brian Gerst wrote:
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -405,37 +405,29 @@ END(__switch_to_asm)
>>> * A newly forked process directly context switches into this address.
>>> *
>>> * rax: prev task we switched from
>>> + * rbx: kernel thread func
>>> + * r12: kernel thread arg
>>> */
>>> ENTRY(ret_from_fork)
>>> movq %rax, %rdi
>>> call schedule_tail /* rdi: 'prev' task parameter */
>>>
>>> - testb $3, CS(%rsp) /* from kernel_thread? */
>>> + testq %rbx, %rbx /* from kernel_thread? */
>>> jnz 1f
>>>
>>> - /*
>>> - * We came from kernel_thread. This code path is quite twisted, and
>>> - * someone should clean it up.
>>> - *
>>> - * copy_thread_tls stashes the function pointer in RBX and the
>>> - * parameter to be passed in RBP. The called function is permitted
>>> - * to call do_execve and thereby jump to user mode.
>>> - */
>>> - movq RBP(%rsp), %rdi
>>> - call *RBX(%rsp)
>>> - movq %rax, RAX(%rsp)
>>> -
>>> - /*
>>> - * Fall through as though we're exiting a syscall. This makes a
>>> - * twisted sort of sense if we just called do_execve.
>>> - */
>>> -
>>> -1:
>>> +2:
>>> movq %rsp, %rdi
>>> call syscall_return_slowpath /* returns with IRQs disabled */
>>> TRACE_IRQS_ON /* user mode is traced as IRQS on */
>>> SWAPGS
>>> jmp restore_regs_and_iret
>>> +
>>> +1:
>>> + /* kernel thread */
>>> + movq %r12, %rdi
>>> + call *%rbx
>>> + movq %rax, RAX(%rsp)
>>> + jmp 2b
>>> END(ret_from_fork)
>>
>> It seems really surprising that a kernel thread would be returning to
>> user space. It would probably be a good idea to preserve the existing
>> comments about that.
>>
>
> Agreed.
>
> Which reminds me: at some point, on top of this series, we should
> consider either having multiple variants of ret_from_fork or otherwise
> generalizing the code. If and when we implement CPL3 for *kernel*
> code (SGX and UEFI come to mind as possible use cases), we probably
> won't want to go through syscall_return_slowpath.

I don't understand what you mean by CPL3 kernel code. Do you mean
something like the VDSO where the kernel maps the code into userspace?
Why would you want to do this?

--
Brian Gerst