Re: x86 entry perf unwinding failure (missing IRET_REGS annotation on stack switch?)

From: Peter Zijlstra
Date: Tue Apr 28 2020 - 08:46:47 EST


On Tue, Apr 28, 2020 at 02:04:50AM -0500, Josh Poimboeuf wrote:
> On Mon, Mar 02, 2020 at 09:52:40AM -0600, Josh Poimboeuf wrote:
> > On Mon, Mar 02, 2020 at 09:18:29AM -0600, Josh Poimboeuf wrote:
> > > > So I think on machines without X86_FEATURE_SMAP, trying to unwind from
> > > > the two NOPs at f41 and f42 will cause the unwinder to report an
> > > > error? Looking at unwind_next_frame(), "sp:(und)" without the "end:1"
> > > > marker seems to be reserved for errors.
> >
> > I think we can blame this one on Peter ;-)
> >
> > 764eef4b109a ("objtool: Rewrite alt->skip_orig")
> >
> > With X86_FEATURE_SMAP, alt->skip_orig gets set, which tells objtool to
> > skip validation of the NOPs. That has the side effect of not
> > propagating the ORC state to the NOPs as well.
>
> I almost forgot about this one, until I rediscovered it just now...
> Peter, I just realized you weren't CCed on the original email, oops.

Nah, I got them (through x86@); but I too lost track of it :/

> I'm thinking something like this should fix it. Peter, does this look
> ok?

Unfortunate. But also, I fear, insufficient. Specifically consider
things like:

ALTERNATIVE "jmp 1f",
"alt...
"..."
"...insn", X86_FEAT_foo
1:

This results in something like:


.text .altinstr_replacement
e8 xx ...
90
90
...
90

Where all our normal single byte nops (0x90) are unreachable with
undefined CFI, but the alternative might have CFI, which is never
propagated.

We ran into this with the validate_alternative stuff from Alexandre.

> @@ -773,12 +772,26 @@ static int handle_group_alt(struct objtool_file *file,
> struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL;
> unsigned long dest_off;
>
> + /*
> + * For uaccess checking, propagate the STAC/CLAC from the alternative
> + * to the original insn to avoid paths where we see the STAC but then
> + * take the NOP instead of CLAC (and vice versa).
> + */
> + if (!orig_insn->ignore_alts && orig_insn->type == INSN_NOP &&
> + *new_insn &&
> + ((*new_insn)->type == INSN_STAC ||
> + (*new_insn)->type == INSN_CLAC))
> + orig_insn->type = (*new_insn)->type;

Also, this generates a mis-match between actual instruction text and
type. We now have a single byte instruction (0x90) with the type of a 3
byte (SLAC/CLAC). Which currently isn't a problem, but I'm looking at
adding infrastructure for having objtool rewrite instructions.

So rather than hacking around this issue, should we not make
create_orc() smarter?

I'm trying to come up with something, but so far I'm just making a mess.