Re: [PATCH] Reduce the number of expensive division instructions doneby _parse_integer()

From: Linus Torvalds
Date: Thu Feb 09 2012 - 12:57:08 EST


On Thu, Feb 9, 2012 at 9:46 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
>
> On fixed-size instruction arches, the runtime shift is probably the better
> option, as simply loading 64-bit large constant would take likely take at
> least four instructions - and might involve a shift anyway.

Bah. The constant is hoisted out of the loop on anything that has
enough registers - and "cannot generate constants" is almost
universally associated with "fair amount of registers".

The exception to that would be architectures with small register sets,
and particularly 32-bit ones. There, the 64-bit value and the multiply
likely not only takes more registers, but also kills registers due to
a function call. On the other hand "few 32-bit registers" is mainly
32-bit 386, which has no problems with a 32-bit constant.

And even if you have "out-of-line 64-bit multiply killing registers so
that you can't hoist the constant", even then gcc knows all about
those "big constants can be expensive" and likely knows how to
optimize them.

In particular, this big constant takes two instructions to generate
even in the absolute *worst* case: one to generate the constant 15,
one to shift it left by 60 (or 28, on a 32-bit setup).

But I don't have access to some broken "I can't do constants well"
architecture to actually test. Gcc could generate crazy bad code, but
almost certainly won't.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/