Re: PROPOSAL: Extend inline asm syntax with size spec

From: Nadav Amit
Date: Sun Oct 07 2018 - 12:09:45 EST


at 2:18 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:

> Hi people,
>
> this is an attempt to see whether gcc's inline asm heuristic when
> estimating inline asm statements' cost for better inlining can be
> improved.
>
> AFAIU, the problematic arises when one ends up using a lot of inline
> asm statements in the kernel but due to the inline asm cost estimation
> heuristic which counts lines, I think, for example like in this here
> macro:
>
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C6db1258c65ea45bbe4ea08d62c35ceec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745007006838299&amp;sdata=iehl%2Fb8h%2BZE%2Frqb4qjac19WekSgOObc9%2BM1Jto1VgF4%3D&amp;reserved=0
>
> the resulting code ends up not inlining the functions themselves which
> use this macro. I.e., you see a CALL <function> instead of its body
> getting inlined directly.
>
> Even though it should be because the actual instructions are only a
> couple in most cases and all those other directives end up in another
> section anyway.
>
> The issue is explained below in the forwarded mail in a larger detail
> too.
>
> Now, Richard suggested doing something like:
>
> 1) inline asm ("...")
> 2) asm ("..." : : : : <size-expr>)
> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>
> with which user can tell gcc what the size of that inline asm statement
> is and thus allow for more precise cost estimation and in the end better
> inlining.
>
> And FWIW 3) looks pretty straight-forward to me because attributes are
> pretty common anyways.
>
> But I'm sure there are other options and I'm sure people will have
> better/different ideas so feel free to chime in.

Thanks for taking care of it. I would like to mention a second issue, since
you may want to resolve both with a single solution: not inlining
conditional __builtin_constant_p(), in which there are two code-paths - one
for constants and one for variables.

Consider for example the Linux kernel ilog2 macro, which has a condition
based on __builtin_constant_p() (
https://elixir.bootlin.com/linux/v4.19-rc7/source/include/linux/log2.h#L160
). The compiler mistakenly considers the âheavyâ code-path that is supposed
to be evaluated only in compilation time to evaluate the code size. This
causes the kernel to consider functions such as kmalloc() as âbigâ.
kmalloc() is marked with always_inline attribute, so instead the calling
functions, such as kzalloc() are not inlined.

When I thought about hacking gcc to solve this issue, I considered an
intrinsic that would override the cost of a given statement. This solution
is not too nice, but may solve both issues.

In addition, note that AFAIU the impact of a wrong cost of code estimation
can also impact loop and other optimizations.

Regards,
Nadav