Hi,
I tuned the xor.h functions for x86-64 a bit. One of the changes I did
was to use streaming stores, which are usually faster. But it's actually
slower in the test it does a booting because that tests cache hot behaviour.
NTI will invalidate the cache line, so it's eating a lot of cache misses.
Does it make sense to test cache hot behaviour? iirc RAID checksumming
should be done without polluting caches. I'm considering to change it
to benchmark a few MB of data buffers to avoid caching effects.
Also I'm not quite sure about the logic between XOR_SELECT_TEMPLATE prefering
SSE :- if the CPU supports SSE2 then all functions including integer can be
compiled with NTA prefetches and NTI stores (on some CPUs an integer NTI
store is not a good idea though) For distribution kernels
they can be duplicated. If integer is really faster for some things
it should be used this way.
Another thing I noticed is that the prefetch distance for the SSE
functions is a bit too short. On modern CPUs 256 bytes is not long enough
(on P4 that would be only two cachelines), it needs more (3-4 cachelines at
least).
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:38 EST