Re: [PATCH] no need to check for NULL before calling kfree()-fs/ext2/

From: Arjan van de Ven
Date: Sun Mar 27 2005 - 03:48:02 EST


On Sat, 2005-03-26 at 18:21 -0500, linux-os wrote:
> On Sat, 26 Mar 2005, Arjan van de Ven wrote:
>
> > On Fri, 2005-03-25 at 17:29 -0500, linux-os wrote:
> >> Isn't it expensive of CPU time to call kfree() even though the
> >> pointer may have already been freed?
> >
> > nope
> >
> > a call instruction is effectively half a cycle or less, the branch
>
> Wrong!

oh? a call is "push eip + a new eip" effectively. the new eip is
entirely free, the push eip takes half a cycle (or 1 full cycle but only
one of the two/three pipelines).

>
> > predictor of the cpu can predict perfectly where the next instruction is
> > from. The extra if() you do in front is a different matter, that can
> > easily cost 100 cycles+. (And those are redundant cycles because kfree
> > will do the if again anyway). So what you propose is to spend 100+
> > cycles to save half a cycle. Not a good tradeoff ;)
> >
>
> Wrong!

Is it wrong that the cpu can predict the target perfectly? No. Unless
you use function pointers (then it's a whole different ballgame).

>
> Pure unmitigated bull-shit. I measure (with hardware devices)
> the execution time of real code in modern CPUs. I do this for
> a living so you don't have to stand in line for a couple of
> hours to have your baggage scanned at the airport.

Ok I used to do this kind of performance work for a living too and
measuring it to death as well.

> Always, always, a call will be more expensive than a branch
> on condition.

It is not on modern Out of order cpus.

> It's impossible to be otherwise. A call requires
> that the return address be written to memory (the stack),
> using register indirection (the stack-pointer).

and it's a so common pattern that it's optimized to death. Internally a
call gets transformed to 2 uops or so, one is push eip, the other is the
jmp (which gets then just absorbed by the "what is the next eip" logic,
just as a "jmp"s are 0 cycles)

> If somebody said; "I think that the code will look better
> and the few cycles lost will not be a consequence with modern
> CPUs...", then there is a point. But coming up with this
> disingenuous bullshit is something else.

I don't have to take this from you and I don't. You're calling me a liar
with zero evidence. Lets get some facts straight
1) On a modern cpu, a miss of the branch predictor is quite expensive.
The entire pipeline needs flushing if this happens, and on a p4 this
will be in the order of 50 to 100 cycles at minimum.
2) absolute "jmp" is free on modern OOO cpus. Instead of taking an
actual execution slot, all that happens is that the "what is the next
EIP" logic gets a different value. (you can argue what happens if you
have a sequence of jmps and that it's not free then, and I'll grant
you that, but that corner case is not relevant here)
3) a "call" instruction gets translated into what basically is
"push EIP" and "jmp" uops.
4) modern processors have special logic to optimize push/pop
instructions; for example a "push eax ; push ebx" sequence will
execute in parallel in the same cycle even though there is a data
dependency on esp, the cpu can perfectly predict the esp effect and
will do so.
5) modern processors have a call/ret fifo cache they use to do branch
prediction for the target of "ret" instructions. Unless you do
misbalanced call/ret pairs the prediction will be perfect.

Based on this the conclusion "a function call is really cheap versus a
conditional branch" is justified imo. Now you better come with proof
about which of the 5 things above I'm totally lying to you or you better
come with an apology.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/