Re: [patch] checksum P6 asm buffer overflow fix + 686 improvements

Artur Skawina (skawina@geocities.com)
Wed, 26 May 1999 23:17:32 +0200


Andrea Arcangeli wrote:
>
> I would be still more convinced if the mmx routine would run some bit
> faster. The numbers was something like 60 unit of time for mmx and 63
> units of time for the current code + my patch.

<disclaimer> i'm _not_ saying all #s i quote are 100% reliable, I'm
seeing a bit to much what appears to be random variations (alignment?).
[yes, all are for hot caches - i only wanted to compare different
solutions wrt eachover]. at least some #s, are better than none. </disclaimer>

note that the current 686 code wasn't optimized fully - I'm was able
to get a 2-3% speedup simply by moving instructions around to better
fit the 4-1-1 scheme (+other minor changes). So, if I assume correctly
that the numbers you mention above don't include the fpu manipulation
cost, I doubt the mmx stuff would be a win.

> It would be interesting
> also to implement a copy-user mmx (as suggested some seconds ago from
> Artur) to see if using movq instead of movl would improve performances
> some bit more.

I will look at the csum&copy routine, and see if it's possible
to squize a few cycles out of that too.

[...looking at the usage peaks...] hmm, I'll have to recheck
this later (my data gathering patch missed one case), but I
suspect we can get a noticable improvement here too.

artur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/