Re: [PATCH 09/22] KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)

From: Sean Christopherson
Date: Mon Sep 09 2024 - 11:49:55 EST

Next message: Sakari Ailus: "Re: [PATCH v4 3/4] media: raspberrypi: Add support for RP1-CFE"
Previous message: Paul Elder: "[PATCH] media: platform: video-mux: Fix mutex locking"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Sep 06, 2024, James Houghton wrote:
> On Fri, Sep 6, 2024 at 5:53 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > #ifdef __x86_64__
> > - asm volatile(".byte 0xc6,0x40,0x0,0x0" :: "a" (gpa) : "memory"); /* MOV RAX, [RAX] */
> > + asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */
>
> FWIW I much prefer the trailing comment you have ended up with vs. the
> one you had before. (To me, the older one _seems_ like it's Intel
> syntax, in which case the comment says it's a load..? The comment you
> have now is, to me, obviously indicating a store. Though... perhaps
> "movq"?)

TL;DR: "movq" is arguably a worse mnemonic than simply "mov" because MOV *and*
MOVQ are absurdly overloaded mnemonics, and because x86-64 is wonky.

Heh, "movq" is technically a different instruction (MMX/SSE instruction). For
ambiguous mnemonics, the assembler infers the exact instructions from the operands.
When a register is the source or destination, appending the size to a vanilla MOV
is 100% optional, as the width of the register communicates the desired size
without any ambiguity.

When there is no register operand, e.g. storing an immediate to memory, the size
becomes necessary, sort of. The assembler will still happily accept an inferred
size, but the size is simply the default operand size for the current mode.

E.g.

mov $0xffff, (%0)

will generate a 4-byte MOV

c7 00 ff ff 00 00

so if you actually wanted a 2-byte MOV, the mnemonic needs to be:

movw $0xffff, (%0)

There is still value in specifying an explicit operand size in assembly, as it
disambiguates the size of human readers, and also generates an error if the
operands mismatch.

E.g.

movw $0xffff, %%eax

will fail with

incorrect register `%eax' used with `w' suffix

The really fun one is if ou want to load a 64-bit gpr with an immediate. All
else being equal, the assembler will generally optimize for code size, and so
if the desired value can be generated by sign-extension, the compiler will opt
for opcode 0xc7 or 0xb8

E.g.

mov $0xffffffffffffffff, %%rax

generates

48 c7 c0 ff ff ff ff

whereas, somewhat counter-intuitively, this

mov $0xffffffff, %%rax

generates the more gnarly

48 b8 ff ff ff ff 00 00 00 00

But wait, there's more! If the developer were a wee bit smarter, they could/should
actually write

mov $0xffffffff, %%eax

to generate

b8 ff ff ff ff

because in x86-64, writing the lower 32 bits of a 64-bit register architecturally
clears the upper 32 bits. I mention this because you'll actually see the compiler
take advantage of this behavior.

E.g. if you were to load RAX through an inline asm constraint

asm volatile(".byte 0xcc" :: "a"(0xffffffff) : "memory");

the generated code will indeed be:

b8 ff ff ff ff mov $0xffffffff,%eax

or if you explicitly load a register with '0'

31 c0 xor %eax,%eax

Lastly, because "%0" in 64-bit mode refers to RAX, not EAX, this:

asm volatile("mov $0xffffffff, %0" :: "a"(gpa) : "memory");

generates

48 b8 ff ff ff ff 00 00 00 00

i.e. is equivalent to "mov .., %%rax".

Jumping back to "movq", it's perfectly fine in this case, but also fully
redundant. And so I would prefer to document it simply as "mov", because "movq"
would be more appropriate to document something like this:

asm volatile("movq %0, %%xmm0" :: "a"(gpa) : "memory");

66 48 0f 6e c0 movq %rax,%xmm0

LOL, which brings up more quirks/warts with x86-64. Many instructions in x86,
especially SIMD instructions, have mandatory "prefixes" in order to squeeze more
instructions out of the available opcodes. E.g. the operand size prefix, 0x66,
is reserved for MMX instructions, which allows the architecture to usurp the
reserved combination for XMM instructions. Table 9-3. Effect of Prefixes on MMX
Instructions says this

Operand Size (66H)Reserved and may result in unpredictable behavior.

and specifically says "unpredictable behavior" instead of #UD, because prefixing
most MMX instructions with 0x66 "promotes" the instruction to operate on XMM
registers.

And then there's the REX prefix, which is actually four prefixes built into one.
The "base" prefix ix 0x40, with the lower 4 bits encoding the four "real" prefixes.
>From Table 2-4. REX Prefix Fields [BITS: 0100WRXB]

Field Name Bit Position Definition
- 7:4 0100
W 3 0 = Operand size determined by CS.D, 1 = 64 Bit Operand Size
R 2 Extension of the ModR/M reg field
X 1 Extension of the SIB index field
B 0 Extension of the ModR/M r/m field, SIB base field, or Opcode reg field

e.g. 0x48 is REX.W, 0x49 is REX.W+REX.B, etc.

The first quirky thing with REX, and REX.W (0x48) + the legacy operand size
prefix (0x66) in particular, is that the legacy prefix is ignored in most cases
if REX.W=1.

For non-byte operations: if a 66H prefix is used with prefix (REX.W = 1), 66H is ignored.

But because 0x66 is a mandatory prefix for MOVQ, it's not ignored (at least, I
don't _think_ it's ignored; objdump and gdb both seem to happy decoding MOVQ
without the prefix).

Anyways, the second quirky thing with REX is that, because REX usurps single-byte
opcodes for DEC and INC

In 64-bit mode, DEC r16 and DEC r32 are not encodable (because opcodes 48H through
4FH are REX prefixes).

In 64-bit mode, INC r16 and INC r32 are not encodable (because opcodes 40H through
47H are REX prefixes).

i.e. uses opcodes that are actual instructions outside of 64-bit mode, the REX
prefix _must_ be the last byte before the non-prefix opcode, otherwise it's
ignored (presumably this avoids extra complexity in the instruction decoder).

Only one REX prefix is allowed per instruction. If used, the REX prefix byte
must immediately precede the opcode byte or the escape opcode byte (0FH). When
a REX prefix is used in conjunction with an instruction containing a mandatory
prefix, the mandatory prefix must come before the REX so the REX prefix can be
immediately preceding the opcode or the escape byte. For example, CVTDQ2PD with
a REX prefix should have REX placed between F3 and 0F E6. Other placements are
ignored. The instruction-size limit of 15 bytes still applies to instructions
with a REX prefix.

So even though the "opcode" for MOVQ is "66 0F 6E" , when encoding with REX.W to
address RAX instead of EAX, the full encoding needs to be "66 48 0F DE", otherwise
REX.W will be ignored, e.g. objdump will interpret it as this, even though the
CPU will decode REX.W as part of the MOVD.

4024d1: 48 rex.W
4024d2: 66 0f 6e c0 movd %eax,%xmm0

And because _that's_ just not confusing enough, there are actually _six_ distinct
opcodes for MOVQ (I think; might be more): two which are REX.W promotions of MOVD
(6E and 7E, ignoring mandatory prefixes and the escape opcode 0F), and four that
are straight quadword moves that can't target registers, i.e. can be encoded even
in 32-bit mode (6F, 7E, 7F, and D6).

So yeah, MOVQ in particular is a disaster, especially in 64-bit mode, so I'd much
prefer to just say "mov %rax, (%rax)" and leave it to the reader to understand
that it's a 64-bit store.

Next message: Sakari Ailus: "Re: [PATCH v4 3/4] media: raspberrypi: Add support for RP1-CFE"
Previous message: Paul Elder: "[PATCH] media: platform: video-mux: Fix mutex locking"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]