Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels

From: Linus Torvalds
Date: Fri Aug 25 2023 - 19:36:45 EST


On Fri, 25 Aug 2023 at 15:57, Bill Wendling <morbo@xxxxxxxxxx> wrote:
> >
> Another idea is that there are __builtin_* functions for a lot of
> functions that are currently in inline asm

No. We've been through this before. The builtins are almost entirely
untested, and often undocumented and buggy.

> The major issue with the
> `__builtin_ia32_readeflags_*` was its inability to take unrelated MSRs
> into account during code motion. That may not be the same worry here?

No, the problem with __builtin_ia32_readeflags_*() was that it was
literally completely buggy and generated entirely broken code:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

but that's really more of a symptom than anything else.

It's a symptom of the fact that unlike inline asm's, those builtins
are often undocumented in what compiler version they appeared, and are
of very questionable quality. They often don't have many users, and
the test suites are non-existent.

For example, we *do* use __builtin_ffs() on x86 for constant values,
because the compiler does the right thing.

But for non-constant ones, the inline asm actually generates better
code: gcc generatea some disgusting mess with a 'bsf' followed by a
'cmov' for the zero case, when we know better.

See for example

https://godbolt.org/z/jKKf48Wsf

I don't understand why compiler people prefer a builtin that is an
untested special case that assumes that the compiler knows what is
going on (and often doesn't), over a generic escape facility that is
supported and needed anyway (inline asm).

In other words: the statement "builtins generate better code" is
simply PROVABLY NOT TRUE.

Builtins have often generated *worse* code than using inline asms, to
the point where "worse" is actively buggy crap.

At least inline asms are reliable. That's a *big* deal.

Linus