Re: [PATCH] common implementation of iterative div/mod

From: Jeremy Fitzhardinge
Date: Thu May 08 2008 - 17:57:55 EST


Segher Boessenkool wrote:
The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.

It's not a "dummy" asm, it actually does something: it tells the
compiler that it has to iterate the loop exactly as written, and
not do something else. I.e., exactly the behaviour we want here.

No, it has no function of its own. It's bullying gcc into not performing an optimisation by giving the impression its doing something.

+ ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);

What a terrible function name.

It's consistent with the other functions defined here. I agree it isn't pretty. If you have a better suggestion, I'm all ears.

static inline void timespec_add_ns(struct timespec *a, u64 ns)
{
- ns += a->tv_nsec;
- while(unlikely(ns >= NSEC_PER_SEC)) {
- /* The following asm() prevents the compiler from
- * optimising this loop into a modulo operation. */
- asm("" : "+r"(ns));
-
- ns -= NSEC_PER_SEC;
- a->tv_sec++;
- }
+ a->tv_sec += iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
a->tv_nsec = ns;
}

...and now the "meat" of this function isn't inline anymore. If we
cared about not doing a divide here, you'll have to explain why
taking this trivial loop out of line is a good idea.

On x86-32 it compiles to 26 instructions and 47 bytes of code. Admittedly it might be smaller inline - or on a 64-bit machine - but I seriously doubt its suffering a huge performance hit from being out of line. These days the inline threshold is very small - probably under 10 instructions. A direct call/return is essentially free, since it can be trivially prefetched.

+unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
+{
+ unsigned ret = 0;
+
+ while(dividend >= divisor) {

You removed the unlikely() here. Why?

Because it didn't seem all that unlikely. Besides, it makes not one bit of difference to the code generated by my compiler.

+ /* The following asm() prevents the compiler from
+ optimising this loop into a modulo operation. */
+ asm("" : "+rm"(dividend));

You changed "+r" to "+rm" here. Why? Also, "rm" is an x86-ism,
and this is generic code (it does work here, but why is it better
than "r"?)

"rm" isn't x86-specific. I just wanted to give the compiler the freedom to put the value in either register or memory if it wanted to.

+EXPORT_SYMBOL(iter_div_u64_rem);

Does this need to be exported?

Everything else in the file is exported.

Can I suggest an alternative approach? Define a macro (with a
good, descriptive name!) for just the asm("" : "+r"(x)), and use
that. Much smaller patch, much clearer code, and doesn't result
in different (and worse) code generation, so it's a much safer
change.

Uh, could you suggest a name? Something along the lines of prevent_gcc_from_strength_reducing_this_subtraction_loop_into_a_modulo_operation_thanks_oh_and_remember_to_use_it_in_all_the_right_places() springs to mind.

Rather than putting this unsightly (though with a smear of lipstick) hack into every open-coded iterative div-mod loop, we may as well implement it once and just look out for places which should be using it.

I don't think the "worse" code generation is much of an issue, since it isn't used anywhere performance critical, and moving the code out of line means there should be an overall reduction in code size.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/