Re: PROPOSAL: Extend inline asm syntax with size spec

From: Nadav Amit
Date: Sun Oct 07 2018 - 15:06:30 EST


at 9:46 AM, Richard Biener <rguenther@xxxxxxx> wrote:

> On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <namit@xxxxxxxxxx> wrote:
>> at 2:18 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
>>
>>> Hi people,
>>>
>>> this is an attempt to see whether gcc's inline asm heuristic when
>>> estimating inline asm statements' cost for better inlining can be
>>> improved.
>>>
>>> AFAIU, the problematic arises when one ends up using a lot of inline
>>> asm statements in the kernel but due to the inline asm cost
>> estimation
>>> heuristic which counts lines, I think, for example like in this here
>>> macro:
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&amp;sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&amp;reserved=0
>>> the resulting code ends up not inlining the functions themselves
>> which
>>> use this macro. I.e., you see a CALL <function> instead of its body
>>> getting inlined directly.
>>>
>>> Even though it should be because the actual instructions are only a
>>> couple in most cases and all those other directives end up in another
>>> section anyway.
>>>
>>> The issue is explained below in the forwarded mail in a larger detail
>>> too.
>>>
>>> Now, Richard suggested doing something like:
>>>
>>> 1) inline asm ("...")
>>> 2) asm ("..." : : : : <size-expr>)
>>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>>>
>>> with which user can tell gcc what the size of that inline asm
>> statement
>>> is and thus allow for more precise cost estimation and in the end
>> better
>>> inlining.
>>>
>>> And FWIW 3) looks pretty straight-forward to me because attributes
>> are
>>> pretty common anyways.
>>>
>>> But I'm sure there are other options and I'm sure people will have
>>> better/different ideas so feel free to chime in.
>>
>> Thanks for taking care of it. I would like to mention a second issue,
>> since
>> you may want to resolve both with a single solution: not inlining
>> conditional __builtin_constant_p(), in which there are two code-paths -
>> one
>> for constants and one for variables.
>>
>> Consider for example the Linux kernel ilog2 macro, which has a
>> condition
>> based on __builtin_constant_p() (
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&amp;sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&amp;reserved=0
>> ). The compiler mistakenly considers the âheavyâ code-path that is
>> supposed
>> to be evaluated only in compilation time to evaluate the code size.
>
> But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do).
>
> Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course.

I understand that this is might not be the right way to implement macros
such as ilog2() and test_bit(), but this code is around for some time.

I thought of using __builtin_choose_expr() instead of ternary operator, but
this introduces a different problem, as the variable version is used instead
of the constant one in many cases. From my brief experiments with llvm, it
appears that llvm does not have both of these issues (wrong cost attributed
to inline asm and conditions based on __builtin_constant_p()).

So what alternative do you propose to implement ilog2() like behavior? I was
digging through the gcc code to find a workaround with no success.

Thanks,
Nadav