Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

From: Masami Hiramatsu
Date: Fri Apr 03 2020 - 23:08:24 EST


On Fri, 3 Apr 2020 13:21:13 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Fri, Apr 03, 2020 at 02:28:37PM +0900, Masami Hiramatsu wrote:
> > On Thu, 2 Apr 2020 16:13:08 +0200
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
> > > While digging through various opcode manuals is of course forever fun, I
> > > do feel like it might not be the best way.
> >
> > Yes, it is possible to add INAT_FPU and insn_is_fpu().
> > But it seems that the below patch needs more classification based on
> > nmemonic or opcodes.
>
> I went with opcode, and I think I did a fairly decent job, but I did
> find a few problems on a second look at things.
>
> I don't think nmemonic are going to help, the x86 nmemonics are a mess
> (much like its opcode tables), there's no way to sanely detect what
> registers are effected by an instruction based on name.
>
> The best I came up with is operand class, see below.

Yeah, so we need another map, current inat map is optimized for
decoding, and lack of some information for reducing size.
E.g. it mixed up the VEX prefix instruction with non-VEX one.

>
> > IMHO, it is the time to expand gen-insn-attr.awk or clone it to
> > generate another opcode map, so that user will easily extend the
> > insn infrastructure.
> > (e.g. I had made an in-kernel disassembler, which generates a mnemonic
> > maps from x86-opcode-map.txt)
> > https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414
>
> Cute, and I'm thinking we might want that eventually, people have been
> asking for a kernel specific objdump, one that knows about and shows all
> the magical things the kernel does, like alternative, jump-labels and
> soon the static_call stuff, but also things like the exception handling.
>
> Objtool actually knows about much of that, and pairing it with your
> disassembler could print it.
>
> > > + if (insn.vex_prefix.nbytes) {
> > > + *type = INSN_FPU;
> > > return 0;
> > > + }
>
> So that's the AVX nonsense dealt with; right until they stick an integer
> instruction in the AVX space I suppose :/ Please tell me they didn't
> already do that..

I'm not so sure.
Theoretically, x86 instruction can be encoded with VEX prefix instead of
REX prefix (most compiler may not output such inefficient code.)

> > > op1 = insn.opcode.bytes[0];
> > > op2 = insn.opcode.bytes[1];
> > > @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
> > >
> > > case 0x0f:
> > >
> > > + switch (op2) {
>
> > > + case 0xae:
> > > + /* insane!! */
> > > + if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != 3 && !insn.prefixes.nbytes)
> > > + *type = INSN_FPU;
> > > + break;
>
> This is crazy, but I was trying to get at the x86 FPU control
> instructions:
>
> FXSAVE, FXRSTOR, LDMXCSR and STMXCSR
>
> Which are in Grp15

Yes, that is a complex part.

> Now arguably, I could skip them, the compiler should never emit those,
> and the newer, fancier, XSAV family isn't marked as FPU either, even
> though it will save/restore the FPU/MMX/SSE/AVX states too.
>
> So I think I'll remove this part, it'll also make the fpu_safe
> annotations easier.
>
> > > + case 0x10 ... 0x17:
> > > + case 0x28 ... 0x2f:
> > > + case 0x3a:
> > > + case 0x50 ... 0x77:
> > > + case 0x7a ... 0x7f:
> > > + case 0xc2:
> > > + case 0xc4 ... 0xc6:
> > > + case 0xd0 ... 0xff:
> > > + /* MMX, SSE, VMX */
>
> So afaict these are the MMX and SSE instruction (clearly the VMX is my
> brain loosing it).
>
> I went with the coder64 opcode tables, but our x86-opcode-map.txt seems
> to agree, mostly.
>
> I now see that 0f 3a is not all mmx/sse, it also includes RORX which is
> an integer instruction. Also, may I state that the opcode map is a
> sodding disgrace? Why is an integer instruction stuck in the middle of
> SSE instructions like that ?!?!
>
> And I should shorten the last range to 0xd0 ... 0xfe, as 0f ff is UD0.
>
> Other than that I think this is pretty accurate.
>
> > > + *type = INSN_FPU;
> > > + break;
> > > +
> > > + default:
> > > + break;
> > > + }
> > > break;
> > >
> > > case 0xc9:
> > > @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
> > >
> > > break;
> > >
> > > + case 0xd8 ... 0xdf: /* x87 FPU range */
> > > + *type = INSN_FPU;
> > > + break;
>
> Our x86-opcode-map.txt lists that as ESC, but doesn't have an escape
> table for it. Per:
>
> http://ref.x86asm.net/coder64.html
>
> these are all the traditional x87 FPU ops.

Yes, for decoding, we don't need those tables.

> > > +
> > > case 0xe3:
> > > /* jecxz/jrcxz */
> > > *type = INSN_JUMP_CONDITIONAL;
>
>
> Now; I suppose I need our x86-opcode-map.txt extended in at least two
> ways:
>
> - all those x87 FPU instructions need adding
> - a way of detecting the affected register set
>
> Now, I suspect we can do that latter by the instruction operands that
> are already there, although I've not managed to untangle them fully
> (hint, we really should improve the comments on top). Operands seem to
> have one capital that denotes the class:
>
> - I: immediate
> - G: general purpose
> - E
> - P,Q: MMX
> - V,M,W,H: SSE
>
> So if we can extend the awk magic to provide operand classes for each
> decoded instruction, then that would simplify this lots.

Hmm, it requires to generate another tables. Instead, what about below?
I've added INAT_FPU (and INAT_FPUIFVEX*) flag to find FPU related code.

*) actually, current inat tables have variant tables for the last prefix
variations. But it doesn't have vex variations which doubles the size
of table, that is too much just for FPU opcode.