Re: [RFC PATCH 0/5] make use of gcc 9's "asm inline()"
From: Linus Torvalds
Date: Thu Aug 29 2019 - 14:15:27 EST
On Thu, Aug 29, 2019 at 10:36 AM Nick Desaulniers
<ndesaulniers@xxxxxxxxxx> wrote:
>
> I'm curious what "the size of the asm" means, and how it differs
> precisely from "how many instructions GCC thinks it is." I would
> think those are one and the same? Or maybe "the size of the asm"
> means the size in bytes when assembled to machine code, as opposed to
> the count of assembly instructions?
The problem is that we do different sections in the inline asm, and
the instruction counts are completely bogus as a result.
The actual instruction in the code stream may be just a single
instruction. But the out-of-line sections can be multiple instructions
and/or a data section that contains exception information.
So we want the asm inlined, because the _inline_ part (and the hot
instruction) is small, even though the asm technically maybe generates
many more bytes of additional data.
The worst offenders for this tend to be
- various exception tables for user accesses etc
- "alternatives" where we list two or more different asm alternatives
and then pick the right one at boot time depending on CPU ID flags
- "BUG_ON()" instructions where there's a "ud2" instruction and
various data annotations going with it
so gcc may be "technically correct" that the inline asm statement
contains ten instructions or more, but the actual instruction _code_
footprint in the asm is likely just a single instruction or two.
The statement counting is also completely off by the fact that some of
the "statements" are assembler directives (ie the
".pushsection"/".popsection" lines etc). So some of it is that the
instruction counting is off, but the largest part is that it's just
not relevant to the code footprint in that function.
Un-inlining a function because it contains a single inline asm
instruction is not productive. Yes, it might result in a smaller
binary over-all (because all those other non-code sections do take up
some space), but it actually results in a bigger code footprint.
Linus