Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)

From: Linus Torvalds (torvalds@transmeta.com)
Date: Tue Mar 18 2003 - 14:21:24 EST

Next message: Steve Lee: "RE: Linux-2.4.20 modem control"
Previous message: Henrique Gobbi: "Building a 2.4.x kernel with all options"
In reply to: Brian Gerst: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Next in thread: Thomas Schlichter: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Reply: Thomas Schlichter: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Reply: Steven Cole: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Reply: H. Peter Anvin: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 18 Mar 2003, Brian Gerst wrote:
>
> Here's a few more data points:

Ok, this shows the behaviour I was trying to explain:

> vendor_id : AuthenticAMD
> cpu family : 5
> model : 8
> model name : AMD-K6(tm) 3D processor
> stepping : 12
> cpu MHz : 451.037
> empty overhead=105 cycles
> load overhead=-2 cycles
> I$ load overhead=30 cycles
> I$ load overhead=90 cycles
> I$ store overhead=95 cycles

ie loading from the same cacheline shows bad behaviour, most likely due to
cache line exclusion. Does anybody have an original Pentium to see if I
remember that one right?

> vendor_id : AuthenticAMD
> cpu family : 6
> model : 6
> model name : AMD Athlon(tm) Processor
> stepping : 2
> cpu MHz : 1409.946
> empty overhead=11 cycles
> load overhead=5 cycles
> I$ load overhead=5 cycles
> I$ load overhead=5 cycles
> I$ store overhead=826 cycles
>
> The Athlon XP shows really bad behavior when you store to the text area.

Wow. There aren't many things that AMD tends to show the P4-like "big
latency in rare cases" behaviour.

But quite honestly, I think they made the right call, and I _expect_ that
of modern CPU's. The fact is, modern CPU's tend to need to pre-decode the
instruction stream some way, and storing to it while running from it is
just a really really bad idea. And since it's so easy to avoid it, you
really just shouldn't do it.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Steve Lee: "RE: Linux-2.4.20 modem control"
Previous message: Henrique Gobbi: "Building a 2.4.x kernel with all options"
In reply to: Brian Gerst: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Next in thread: Thomas Schlichter: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Reply: Thomas Schlichter: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Reply: Steven Cole: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Reply: H. Peter Anvin: "Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:24 EST