Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)

From: Linus Torvalds (torvalds@transmeta.com)
Date: Tue Mar 18 2003 - 14:21:24 EST


On Tue, 18 Mar 2003, Brian Gerst wrote:
>
> Here's a few more data points:

Ok, this shows the behaviour I was trying to explain:

> vendor_id : AuthenticAMD
> cpu family : 5
> model : 8
> model name : AMD-K6(tm) 3D processor
> stepping : 12
> cpu MHz : 451.037
> empty overhead=105 cycles
> load overhead=-2 cycles
> I$ load overhead=30 cycles
> I$ load overhead=90 cycles
> I$ store overhead=95 cycles

ie loading from the same cacheline shows bad behaviour, most likely due to
cache line exclusion. Does anybody have an original Pentium to see if I
remember that one right?

> vendor_id : AuthenticAMD
> cpu family : 6
> model : 6
> model name : AMD Athlon(tm) Processor
> stepping : 2
> cpu MHz : 1409.946
> empty overhead=11 cycles
> load overhead=5 cycles
> I$ load overhead=5 cycles
> I$ load overhead=5 cycles
> I$ store overhead=826 cycles
>
> The Athlon XP shows really bad behavior when you store to the text area.

Wow. There aren't many things that AMD tends to show the P4-like "big
latency in rare cases" behaviour.

But quite honestly, I think they made the right call, and I _expect_ that
of modern CPU's. The fact is, modern CPU's tend to need to pre-decode the
instruction stream some way, and storing to it while running from it is
just a really really bad idea. And since it's so easy to avoid it, you
really just shouldn't do it.

                        Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:24 EST