Re: [PATCH 11/34] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack

From: Joerg Roedel
Date: Mon Mar 05 2018 - 13:25:32 EST


Hi Brian,

thanks for your review and helpful input.

On Mon, Mar 05, 2018 at 11:41:01AM -0500, Brian Gerst wrote:
> On Mon, Mar 5, 2018 at 5:25 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote:
> > +.Lentry_from_kernel_\@:
> > +
> > + /*
> > + * This handles the case when we enter the kernel from
> > + * kernel-mode and %esp points to the entry-stack. When this
> > + * happens we need to switch to the task-stack to run C code,
> > + * but switch back to the entry-stack again when we approach
> > + * iret and return to the interrupted code-path. This usually
> > + * happens when we hit an exception while restoring user-space
> > + * segment registers on the way back to user-space.
> > + *
> > + * When we switch to the task-stack here, we can't trust the
> > + * contents of the entry-stack anymore, as the exception handler
> > + * might be scheduled out or moved to another CPU. Therefore we
> > + * copy the complete entry-stack to the task-stack and set a
> > + * marker in the iret-frame (bit 31 of the CS dword) to detect
> > + * what we've done on the iret path.
>
> We don't need to worry about preemption changing the entry stack. The
> faults that IRET or segment loads can generate just run the exception
> fixup handler and return. Interrupts were disabled when the fault
> occurred, so the kernel cannot be preempted. The other case to watch
> is #DB on SYSENTER, but that simply returns and doesn't sleep either.
>
> We can keep the same process as the existing debug/NMI handlers -
> leave the current exception pt_regs on the entry stack and just switch
> to the task stack for the call to the handler. Then switch back to
> the entry stack and continue. No copying needed.

Okay, I'll look into that. Will it even be true for fully preemptible
and RT kernels that there can't be any preemption of these handlers?

> > + /* Mark stackframe as coming from entry stack */
> > + orl $CS_FROM_ENTRY_STACK, PT_CS(%esp)
>
> Not all 32-bit processors will zero-extend segment pushes. You will
> need to explicitly clear the bit in the case where we didn't switch
> CR3.

Okay, thanks, will add that.


Regards,

Joerg