Re: [RFC PATCH 2/2] KVM: emulate: clean up initializations in init_decode_cache

From: Paolo Bonzini
Date: Fri Apr 04 2014 - 05:47:49 EST


Hi Bandan.

> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 8e2b866..eac488b 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1072,6 +1072,9 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
> ctxt->modrm_reg = (ctxt->rex_prefix & 4) << 1; /* REX.R */
> index_reg = (ctxt->rex_prefix & 2) << 2; /* REX.X */
> ctxt->modrm_rm = base_reg = (ctxt->rex_prefix & 1) << 3; /* REG.B */
> + } else {
> + ctxt->modrm_reg = 0;
> + ctxt->modrm_rm = 0;
> }

You can drop the if altogether here (and also the initialization
in "int index_reg = 0, base_reg = 0, scale;").

Also, getting down to micro-micro-optimization, the following will give
slightly better code:

ctxt->modrm_reg = (ctxt->rex_prefix << 1) & 8; /* REX.R */
index_reg = (ctxt->rex_prefix << 2) & 8; /* REX.X */
ctxt->modrm_rm = base_reg = (ctxt->rex_prefix << 3) & 8; /* REG.B */

because x86 can do a three-operand shift (using the LEA instruction), but
not three-operand AND.

> ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6;
> @@ -4357,6 +4360,8 @@ done_prefixes:
>
> if (ctxt->d & ModRM)
> ctxt->modrm = insn_fetch(u8, ctxt);
> + else
> + ctxt->modrm = 0;

I think in the "else" case modrm won't be read at all, but it is indeed a bit
safer this way. You can just leave it in the "zeroed" part of ctxt. There is
padding in the struct, so you get the initialization for free.


> while (ctxt->d & GroupMask) {
> switch (ctxt->d & GroupMask) {
> @@ -4435,10 +4440,14 @@ done_prefixes:
> ctxt->op_bytes = 16;
> else if (ctxt->d & Mmx)
> ctxt->op_bytes = 8;
> + } else {
> + ctxt->intercept = 0;

You can add a preparatory patch that checks (ctxt->d & Intercept) in
x86_emulate_insn instead of ctxt->intercept and skip this initialization.

> + ctxt->check_perm = NULL;

Similarly you can check (ctxt->d & CheckPerm) and skip this initialization.

> }

Same here: the common case will zero them, might as well leave it to the memset.

>
> /* ModRM and SIB bytes. */
> if (ctxt->d & ModRM) {
> + ctxt->modrm_mod = 0;

What about instead changing

ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6;

from "|=" to "="?

> rc = decode_modrm(ctxt, &ctxt->memop);
> if (!ctxt->has_seg_override)
> set_seg_override(ctxt, ctxt->modrm_seg);
> @@ -4552,14 +4561,41 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>
> void init_decode_cache(struct x86_emulate_ctxt *ctxt)
> {
> - memset(&ctxt->opcode_len, 0,
> - (void *)&ctxt->_regs - (void *)&ctxt->opcode_len);
>
> - ctxt->fetch.start = 0;
> - ctxt->fetch.end = 0;
> + /*
> + * Variables that don't require initializing to 0
> + * opcode_len - set in x86_decode_insn
> + * b - set in x86_decode_insn
> + * intercept - conditionally set in x86_decode_insn, added
> + * else set to 0
> + * op_bytes - initialized in x86_decode_insn
> + * ad_bytes - initialized in x86_decode_insn
> + * rex_prefix - conditionally set in x86_decode_isn

I think rex_prefix must be in the zeroed area. Again, with careful field
placement you get that for free.

> + * struct operands src,src2,dst - set by calling decode_operand
> + * in x86_decode_insn,
> + * default.type = OP_NONE
> + * (*execute) - set in x86_decode_insn
> + * (*check_perm) - conditionally set in x86_decode_insn, added
> + * else set to 0
> + * d - set in x86_decode_insn
> + * modrm - conditionally set in x86_decode_insn, added else set to 0
> + * modrm_mod - or'ed in decode_modrm which is conditionally called in
> + * in x86_decode_insn, added initialization to 0 before call
> + * modrm_reg - set in decode_modrm or else decode_register_operand
> + * modrm_rm - set in decode_modrm, added else set to 0
> + * modrm_seg - set in decode_modrm
> + * _eip - set in x86_decode_insn
> + * memop - .type set to OP_NONE in x86_decode_insn
> + * ctxt->fetch.start - set in x86_decode_insn
> + * ctxt->fetch.end
> + * ctxt->mem_read.pos - set in x86_emulate_insn

Apart from the above suggestions, I agree with this patch.

Also, memopp is initialized to NULL and can be moved to the zeroed area.
But perhaps we can remove memopp, see below.

You now have:

+ u8 lock_prefix;
+ u8 rep_prefix;
bool has_seg_override;
u8 seg_override;
u64 d;
+ bool rip_relative;
+ /* bitmaps of registers in _regs[] that can be read */
+ u32 regs_valid;
+ /* bitmaps of registers in _regs[] that have been written */
+ u32 regs_dirty;

Even if we add back a couple of fields, that's pretty good! It's 32
bytes (four stores). rip_relative introduces a lot of padding; move
it together with other u8 and bool fields, and that makes 24.
And ctxt->d is in the zeroed area by mistake, which makes it 16.

There is a little more that you can do in this field.

seg_override can be moved out of the zeroed area, and the seg_override()
calls can be changed to direct accesses of the field. Please double check
though. :) Actually, if the above is true, has_seg_override can also be
eliminated and moved to x86_decode_insn as a local variable.

We could also get rid completely of rip_relative and move it to a local
variable in decode_modrm. Currently it is in x86_decode_insn because
of this earlier check:

if (ctxt->memop.type == OP_MEM && ctxt->ad_bytes != 8)
ctxt->memop.addr.mem.ea = (u32)ctxt->memop.addr.mem.ea;

...

if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative)
ctxt->memopp->addr.mem.ea += ctxt->_eip;


but I think the earlier check can be moved to decode_modrm as well.
This is my reasoning:

- the first "if" can only trigger for modrm (decode_abs will not set
mem.ea to an out-of-range value if ctxt->ad_bytes != 8)

- ctxt->memopp != NULL means that *ctxt->memopp was copied from ctxt->memop
(so it must be either ModRM or MemAbs), but rip_relative is set to 1
only if the instruction is ModRM, never for MemAbs.

And once you do this, memopp is never read anymore and can be dropped.


This gives the following series:

- patch 1 from here

- replace ->intercept and ->check_perm checks with ctxt->d checks

- cleanups and code changes from this patch, except struct reorganization and
memset change

- struct reorganization and memset change; instead of the large comment in
init_decode_cache, please add comments in front of each field or group of
fields

- removing has_seg_override and moving seg_override out of the zeroed area

- removing memopp and rip_relative

(BTW, I found a bug in my second series while studying your patch. It
ignores the mem_read cache in prepare_memory_operand, but it shouldn't).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/