Re: 2.5.40-mm1

From: Andrew Morton (akpm@digeo.com)
Date: Wed Oct 09 2002 - 18:32:11 EST


Mala Anand wrote:
>
> ...
> P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
> Src is aligned and dst is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch 2.5.40+patch++
> Align throughout throughput throughput
> (bytes) KB/sec KB/sec KB/sec
> 0 1360071 1314783 912359
> 1 323674 340447
> 2 329202 336425
> 4 512955 693170
> 8 523223 615097 506641
> 12 517184 558701 553700
> 16 966598 872080 932736
> 32 846937 838514 845178

Note the tremendous slowdown which the P4 suffers when you're not
cacheline aligned. Even 32-byte-aligned is down a lot.

 
> I see too much variance in the test results so I ran
> each test 3 times. I tried increasing the iterations
> but it did not reduce the variance.
>
> Dst is aligned and src is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch
> Align throughout throughput
> (bytes) KB/sec KB/sec
> 0 1275372 1029815
> 1 529907 511815
> 2 534811 530850
> 4 643196 627013
> 8 568000 626676
> 12 574468 658793
> 16 631707 635979
> 32 741485 592938

This differs a little from my P4 testing - the rep;movsl approach
seemed OK for 8,16,32 alignment.

But still, that's something we can tune later.
 
>
> However I have seen using floating point registers instead of integer
> registers on Pentium IV improves performance to a greater extent on
> some alignments. I need to do more testing and then I will create a
> patch for pentium IV.

I believe there are "issues" using those registers in-kernel. Related
to the need to save/restore them, or errata; not too sure about that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:34 EST