Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: Sean Christopherson
Date: Fri Nov 22 2019 - 13:45:01 EST


On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 22, 2019 at 11:51:41AM +0100, Peter Zijlstra wrote:
>
> > A non-lethal default enabled variant would be even better for them :-)
>
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index d779366ce3f8..d23638a0525e 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -92,6 +92,7 @@ struct thread_info {
> #define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
> #define TIF_NOTSC 16 /* TSC is not accessible in userland */
> #define TIF_IA32 17 /* IA32 compatibility process */
> +#define TIF_SLD 18 /* split_lock_detect */

Maybe use SLAC (Split-Lock AC) as the acronym? I can't help but read
SLD as "split-lock disabled". And name this TIF_NOSLAC (or TIF_NOSLD if
you don't like SLAC) since it's set when the task is running without #AC?

> #define TIF_NOHZ 19 /* in adaptive nohz mode */
> #define TIF_MEMDIE 20 /* is terminating due to OOM killer */
> #define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
> @@ -122,6 +123,7 @@ struct thread_info {
> #define _TIF_NOCPUID (1 << TIF_NOCPUID)
> #define _TIF_NOTSC (1 << TIF_NOTSC)
> #define _TIF_IA32 (1 << TIF_IA32)
> +#define _TIF_SLD (1 << TIF_SLD)
> #define _TIF_NOHZ (1 << TIF_NOHZ)
> #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
> #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)

...

> +void handle_split_lock(void)
> +{
> + return sld_state != sld_off;
> +}
> +
> +void handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + if (sld_state == sld_fatal)
> + return false;
> +
> + pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> + current->comm, current->pid, regs->ip);
> +
> + __sld_set_msr(false);
> + set_tsk_thread_flag(current, TIF_CLD);
> + return true;
> +}
> +
> +void switch_sld(struct task_struct *prev)
> +{
> + __sld_set_msr(true);
> + clear_tsk_thread_flag(current, TIF_CLD);
> +}

...

> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index bd2a11ca5dd6..c04476a1f970 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> /* Enforce MSR update to ensure consistent state */
> __speculation_ctrl_update(~tifn, tifn);
> }
> +
> + if (tifp & _TIF_SLD)
> + switch_sld(prev_p);
> }

Re-enabling #AC when scheduling out the misbehaving task would also work
well for KVM, e.g. call a variant of handle_user_split_lock() on an
unhandled #AC in the guest. We can also reuse KVM's existing code to
restore the MSR on return to userspace so that an #AC in the guest doesn't
disable detection in the userspace VMM.

Alternatively, KVM could manually do it's own thing and context switch
the MSR on VM-Enter/VM-Exit (after an unhandled #AC), but I'd rather keep
this out of the VM-Enter path and also avoid thrashing the MSR on an SMT
CPU. The only downside is that KVM itself would occasionally run with #AC
disabled, but that doesn't seem like a big deal since split locks should
not be magically appearing in KVM.

Last thought, KVM should only expose split lock #AC to the guest if SMT=n
or the host is in "force" mode so that split lock #AC is always enabled
in hardware (for the guest) when then guest wants it enabled. KVM would
obviously not actually disable #AC in hardware when running in force mode,
regardless of the guest's wishes.

> /*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3451a004e162..3cba28c9c4d9 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -242,7 +242,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> {
> struct task_struct *tsk = current;
>
> -
> if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> return;
>
> @@ -288,9 +287,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
> DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
> DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
> DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
> -DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
> #undef IP
>
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + unsigned int trapnr = X86_TRAP_AC;
> + char str[] = "alignment check";
> + int signr = SIGBUS;
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> + return;
> +
> + if (!handle_split_lock())

Pretty sure this should be omitted entirely. For an #AC in the kernel,
simply restarting the instruction will fault indefinitely, e.g. dieing is
probably the best course of action if a (completely unexpteced) #AC occurs
in "off" mode. Dropping this check also lets handle_user_split_lock() do
the right thing for #AC due to EFLAGS.AC=1 (pointed out by Tony).

> + return;
> +
> + if (!user_mode(regs))
> + die("Split lock detected\n", regs, error_code);
> +
> + cond_local_irq_enable(regs);
> +
> + if (handle_user_split_lock(regs, error_code))
> + return;
> +
> + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> + error_code, BUS_ADRALN, NULL);
> +}
> +
> #ifdef CONFIG_VMAP_STACK
> __visible void __noreturn handle_stack_overflow(const char *message,
> struct pt_regs *regs,