Re: [PATCH v4 09/17] x86/entry: Add new, comprehensible entry and exit hooks

From: Borislav Petkov
Date: Thu Jul 02 2015 - 05:49:17 EST


On Mon, Jun 29, 2015 at 12:33:41PM -0700, Andy Lutomirski wrote:
> The current entry and exit code is incomprehensible, appears to work
> primary by luck, and is very difficult to incrementally improve. Add
> new code in preparation for simply deleting the old code.
>
> prepare_exit_to_usermode is a new function that will handle all slow
> path exits to user mode. It is called with IRQs disabled and it
> leaves us in a state in which it is safe to immediately return to
> user mode. IRQs must not be re-enabled at any point after
> prepare_exit_to_usermode returns and user mode is actually entered.
> (We can, of course, fail to enter user mode and treat that failure
> as a fresh entry to kernel mode.) All callers of do_notify_resume
> will be migrated to call prepare_exit_to_usermode instead;
> prepare_exit_to_usermode needs to do everything that
> do_notify_resume does, but it also takes care of scheduling and
> context tracking. Unlike do_notify_resume, it does not need to be
> called in a loop.
>
> syscall_return_slowpath is exactly what it sounds like. It will be
> called on any syscall exit slow path. It will replaces
> syscall_trace_leave and it calls prepare_exit_to_usermode on the way
> out.
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
> arch/x86/entry/common.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 111 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 8a7e35af7164..55530d6dd1bd 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -207,6 +207,7 @@ long syscall_trace_enter(struct pt_regs *regs)
> return syscall_trace_enter_phase2(regs, arch, phase1_result);
> }
>
> +/* Deprecated. */
> void syscall_trace_leave(struct pt_regs *regs)

Ah yes, this will get replaced later with syscall_return_slowpath below.

> {
> bool step;
> @@ -237,8 +238,117 @@ void syscall_trace_leave(struct pt_regs *regs)
> user_enter();
> }
>
> +static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
> +{
> + unsigned long top_of_stack =
> + (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING;
> + return (struct thread_info *)(top_of_stack - THREAD_SIZE);
> +}
> +
> +/* Called with IRQs disabled. */
> +__visible void prepare_exit_to_usermode(struct pt_regs *regs)
> +{
> + if (WARN_ON(!irqs_disabled()))
> + local_irq_disable();
> +
> + /*
> + * In order to return to user mode, we need to have IRQs off with
> + * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY,
> + * _TIF_UPROBE, or _TIF_NEED_RESCHED set. Several of these flags
> + * can be set at any time on preemptable kernels if we have IRQs on,
> + * so we need to loop. Disabling preemption wouldn't help: doing the
> + * work to clear some of the flags can sleep.
> + */
> + while (true) {
> + u32 cached_flags =
> + READ_ONCE(pt_regs_to_thread_info(regs)->flags);
> +
> + if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
> + _TIF_UPROBE | _TIF_NEED_RESCHED)))
> + break;
> +
> + /* We have work to do. */
> + local_irq_enable();
> +
> + if (cached_flags & _TIF_NEED_RESCHED)
> + schedule();
> +
> + if (cached_flags & _TIF_UPROBE)
> + uprobe_notify_resume(regs);
> +
> + /* deal with pending signal delivery */
> + if (cached_flags & _TIF_SIGPENDING)
> + do_signal(regs);
> +
> + if (cached_flags & _TIF_NOTIFY_RESUME) {
> + clear_thread_flag(TIF_NOTIFY_RESUME);
> + tracehook_notify_resume(regs);
> + }
> +
> + if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> + fire_user_return_notifiers();
> +
> + /* Disable IRQs and retry */
> + local_irq_disable();
> + }

Stupid question: what assures us that we'll break out of this loop
at some point? I.e., isn't the scenario possible of something always
setting bits in ->flags while we're handling stuff in the IRQs on
section?

OTOH, this is what int_ret_from_sys_call() does now anyway so we should
be fine.

Yeah, it looks that way.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/