if() body folding, #pragma rarely_used

Janos Farkas (Janos.Farkas-#vLfPkj5DWutEBLg.lmr2zlKFecy@shadow.banki.hu)
Thu, 11 Sep 1997 14:22:17 +0200


[Linus stripped from the Cc: I'm not sure he wants to learn C from me..
:)]

On 1997-09-10 at 10:34:07, Rogier Wolff wrote:
> I hate this type of code, but for better cache-use, it is the faster
> way to write the code.....
>
> Would someone be interested in implementing
>
> #pragma rarely_used
>
> which would go inside an "if" statement, and cause gcc to do exactly
> what Bill is doing up there by hand....
>
> The new code would be:
>
> if (minix_set_bit(j,bh->b_data)) {
> #pragma rarely_used
> printk("new_inode: bit already set");
> iput(inode);
> return NULL;
> }
[...to put the body of the function out-of-line to not trash the cache..]

You seem to misunderstand part of the issue. At least I like to use
similar goto constructs by a different reasoning. It's not just because
the if body will be in-line, so we must jump it over. Anyway, the branch
cache may perform better if the jump is done, so it might work out
better if the rarely used but small part is in-line (just guessing).

My problem is that at least the GCC I have here likes to compile such
code with the return in place every time; i.e. with every if (cond)
{dosomething(); return}; GCC will place a stack frame cleanup and
return in every place. So, that #pragma would solve only half of the
issue. If GCC would be intelligent enough it could determine the
common parts of the {dosomething(); return} fragments, and place it all
at the end of the function. That's not done currently, so it's
probably not so easy.. :)

So, if that was too short; my problem can be seen with the following code:

ret = 0;
x=kmalloc(whatever);
if (!x) {
printk("no x\n");
return ret; /* common */
}

bh = bread(whatever);
if (!bh) {
printk("no bh\n");
kfree(x); /* common */
return ret; /* common */
}

ret = dosomething()

brelse(bh);
kfree(x); /* common */
return ret; /* common */

It's not really a good example, since nesting if's is quite easy in the
above case; but if somehow I'm forced to the above, GCC won't notice
that the parts marked /* common */ are the same and can be folded to
the end of the function. GCC simply places a return sequence (pop used
registers, restore stack frame, return, etc) for every return. So,
because I know my code, and what I'm trying to do, and that GCC doesn't
really understand it, I use some gotos to save some bytes.

Note also that this code is already "optimized" to make the code
"foldable", since in the failure cases, one naturally would write to
return 0; not a preset variable.

However, I said "my problem", but it does not bothers me really, I know
what I'm doing in this case. There could be that GCC for other
architectures behaves better, or that gotos are more expensive there,
but I hope it's a nice solution everywhere. :)

-- 
Janos - Don't worry, my address works.  I'm just bored of spam.