Re: [RFC] Have insn decoder functions return success/failure

From: Masami Hiramatsu
Date: Thu Oct 22 2020 - 03:31:08 EST


On Wed, 21 Oct 2020 18:45:58 +0200
Borislav Petkov <bp@xxxxxxxxx> wrote:

> On Wed, Oct 21, 2020 at 11:26:13PM +0900, Masami Hiramatsu wrote:
> > Hmm, I meant someone might think it can be used for filtering the
> > instruction something like,
> >
> > insn_init(insn, buf, buflen, 1);
> > ret = insn_get_length(insn);
> > if (!ret) {
> > /* OK, this is safe */
> > patch_text(buf, trampoline);
> > }
> >
> > No, we need another validator for such usage.
>
> Well, I think calling insn_get_length() should give you only the
> *length* of the insn and nothing else - I mean that is what the function
> is called. And I believe current use is wrong.
>
> Examples:
>
> arch/x86/kernel/kprobes/core.c:
> insn_get_length(&insn);
>
> /*
> * Another debugging subsystem might insert this breakpoint.
> * In that case, we can't recover it.
> */
> if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
>
> So this has called get_length but it is far from clear that after that
> call, the opcode bytes in insn.opcode.bytes are there.

No, insn_get_length() implies it decodes whole of the instruction.
(yeah, we need an alias of that, something like insn_get_complete())

>
> What that should do instead IMO is this:
>
> insn_get_opcode(&insn);
>

No, you've cut the last lines of that loop.

/*
* Another debugging subsystem might insert this breakpoint.
* In that case, we can't recover it.
*/
if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
return 0;
addr += insn.length;
}

I need insn.length too. Of course we can split it into 2 calls. But
as I said, since the insn_get_length() implies it decodes all other
parts, I just called it once.

> and *then* the return value can tell you whether the opcode bytes were
> parsed properly or not. See what I mean?

I agreed to check the return value of insn_get_length() at that point
only for checking whether the instruction parsing was failed or not.

>
> That's even documented that way:
>
> /**
> * insn_get_opcode - collect opcode(s)
> * @insn: &struct insn containing instruction
> *
> * Populates @insn->opcode, updates @insn->next_byte to point past the
> * opcode byte(s), and set @insn->attr (except for groups).
>
>
> Similarly here:
>
> static enum es_result vc_decode_insn(struct es_em_ctxt *ctxt)
>
> ...
>
> insn_get_length(&ctxt->insn);
>
> ret = ctxt->insn.immediate.got ? ES_OK : ES_DECODE_FAILED;
>
> that thing wants to decode the insn but it is looking whether it parsed
> an *immediate*?!

Hm, it is better to call insn_get_immediate() if it doesn't use length later.

>
> I'm not saying this is necessarily wrong - just the naming nomenclature
> and the API should be properly defined when you call a function of the
> insn decoder, what you are guaranteed to get and what a caller can
> assume after that. And then the proper functions be called.

Would you mean we'd better have something like insn_get_until_immediate() ?

Since the x86 instruction is CISC, we can not decode intermediate
parts. The APIs follows that. If you are confused, I'm sorry about that.

>
> In the kprobes/core.c example above, it does a little further:
>
> ddr += insn.length;
>
> which, IMO, it should be either preceeded by a call to insn_get_length()
> - yes, this time we want the insn length or, the code should call a
> decoding function which gives you *both* length* and opcode bytes.
> insn_decode_insn() or whatever. And *that* should be documented in that
> function's kernel-doc section. And so on...

Actually, there is a historical reason too. INT3 check was added afterwards.
At first, I just calculated the instruction length in the loop...

Thank you,

>
> Does that make more sense?
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette


--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>