On Wed, 26 Sep 2001, Alan Cox wrote:
> >
> > Your Athlons may handle exclusive cache line acquisition more
> > efficiently (due to memory subsystem performance) but it still
> > does cost something.
>
> On an exclusive line on Athlon a lock cycle is near enough free, its
> just an ordering constraint. Since the line is in E state no other bus
> master can hold a copy in cache so the atomicity is there. Ditto for newer
> Intel processors
You misunderstood the problem, I think: when the line moves from one CPU
to the other (the exclusive state moves along with it), that is
_expensive_.
Even when you have a backside bus (or cache pushout content snooping) to
allow the cacheline to move directly from one CPU to the other without
having to go through memory, that's a really expensive thing to do.
So re-aquring the lock on the same CPU is pretty much free (18 cycles for
Intel, if I remember correctly, and that's _entirely_ due to the pipeline
flush to ensure in-order execution around it).
[ Oh, just for interest I checked my P4, which has a much longer pipeline:
the cost of an exclusive locked access is a whopping 104 cycles. But we
already knew that the first-generation P4 does badly on many things.
Just reading the cycle counter is apparently around 80 cycles on a P4,
it's 32 cycles on a PIII. Looks like that also stalls the pipeline or
something. But cpuid is _really_ horrible. Test out the attached
program.
PIII:
nothing: 32 cycles
locked add: 50 cycles
cpuid: 170 cycles
P4:
nothing: 80 cycles
locked add: 184 cycles
cpuid: 652 cycles
Remember: these are for the already-exclusive-cache cases. ]
What are the athlon numbers?
Linus
This archive was generated by hypermail 2b29 : Sun Sep 30 2001 - 21:00:46 EST