<disclaimer> i'm _not_ saying all #s i quote are 100% reliable, I'm
seeing a bit to much what appears to be random variations (alignment?).
[yes, all are for hot caches - i only wanted to compare different
solutions wrt eachover]. at least some #s, are better than none. </disclaimer>
note that the current 686 code wasn't optimized fully - I'm was able
to get a 2-3% speedup simply by moving instructions around to better
fit the 4-1-1 scheme (+other minor changes). So, if I assume correctly
that the numbers you mention above don't include the fpu manipulation
cost, I doubt the mmx stuff would be a win.
> It would be interesting
> also to implement a copy-user mmx (as suggested some seconds ago from
> Artur) to see if using movq instead of movl would improve performances
> some bit more.
I will look at the csum© routine, and see if it's possible
to squize a few cycles out of that too.
[...looking at the usage peaks...] hmm, I'll have to recheck
this later (my data gathering patch missed one case), but I
suspect we can get a noticable improvement here too.
artur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/