On Tue, 18 Mar 2003, Brian Gerst wrote:
>
> Here's a few more data points:
Ok, this shows the behaviour I was trying to explain:
> vendor_id : AuthenticAMD
> cpu family : 5
> model : 8
> model name : AMD-K6(tm) 3D processor
> stepping : 12
> cpu MHz : 451.037
> empty overhead=105 cycles
> load overhead=-2 cycles
> I$ load overhead=30 cycles
> I$ load overhead=90 cycles
> I$ store overhead=95 cycles
ie loading from the same cacheline shows bad behaviour, most likely due to
cache line exclusion. Does anybody have an original Pentium to see if I
remember that one right?
> vendor_id : AuthenticAMD
> cpu family : 6
> model : 6
> model name : AMD Athlon(tm) Processor
> stepping : 2
> cpu MHz : 1409.946
> empty overhead=11 cycles
> load overhead=5 cycles
> I$ load overhead=5 cycles
> I$ load overhead=5 cycles
> I$ store overhead=826 cycles
>
> The Athlon XP shows really bad behavior when you store to the text area.
Wow. There aren't many things that AMD tends to show the P4-like "big
latency in rare cases" behaviour.
But quite honestly, I think they made the right call, and I _expect_ that
of modern CPU's. The fact is, modern CPU's tend to need to pre-decode the
instruction stream some way, and storing to it while running from it is
just a really really bad idea. And since it's so easy to avoid it, you
really just shouldn't do it.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:24 EST