Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode

From: Peter Zijlstra
Date: Wed Apr 20 2016 - 07:04:28 EST


On Thu, Apr 14, 2016 at 11:27:35AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 13, 2016 at 9:55 AM, Dmitry Safonov <dsafonov@xxxxxxxxxxxxx> wrote:
> > On 04/08/2016 11:44 PM, Andy Lutomirski wrote:
> >>
> >> Feel free to ask for help on some of these details. user_64bit_mode
> >> will be helpful too.
> >
> > Hello again,
> >
> > here are some questions on TIF_IA32 removal:
> > - in function intel_pmu_pebs_fixup_ip: there is need to
> > know if process was it native/compat mode for instruction
> > interpreter for IP + one instruction fixup. There are
> > registers, but they are from PEBS, which does not contain
> > segment descriptors (even for PEBSv3). Other values
> > are from interrupt regs (look at setup_pebs_sample_data).
> > So, I guess, we may use user_64bit_mode on interrupt
> > register set, which will be racy with changing task's mode,
> > but quite ok?
>
> Here's my understanding:
>
> We don't actually know the mode, and there's no way we could get it
> exactly. User code could have changed the mode between when the PEBS
> event was written and when we got the interrupt, and there's no way
> for us to tell.
>
> The regs passed to the interrupt aren't particularly helpful -- if we
> get the overflow event from kernel mode, the regs will be kernel regs,
> not user regs.
>
> What we can do is to the the regs returned by perf_get_regs_user,
> which I imagine perf is already doing. Peter, is this the case?

*confused*, how is perf_get_regs_user() connected to the PEBS fixup?

Ah, you want to use perf_get_regs_user() instead of task_pt_regs()
because of how an NMI during interrupt entry would mess up the
task_pt_regs() contents.

At that point you can use regs_user->abi, right?

> If necessary, starting in 4.6, I could make the regs->cs part of
> perf_get_regs_user be correct no matter what -- the only funny cases
> left are NMI-in-system-call-prologue (there can't be intervening
> interrupts any more other than MCE, and I don't think we really care
> if we report correct PEBS results if we take a machine check in the
> middle).
>
> > - the same with LBR branching: I may got cs value for
> > user_64bit_mode or all registers set from intel_pmu_handle_irq
> > and pass it through intel_pmu_lbr_read => intel_pmu_lbr_filter
> > to branch_type for instruction decoder, which may
> > missinterpret opcode for the same racy-mode-switching app.
> > Is it also fine?
>
> Same thing, I think.

Yep, whatever works for PEBS should also work for the LBR case. Both can
handle an occasional failed decode. Esp. if userspace is doing daft
things like changing the mode, you get to keep whatever pieces result
from that.