Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

From: George Spelvin
Date: Thu May 29 2014 - 19:54:38 EST


Sorry for the delay; my Ivy Bridge test machine isn't in my
office and getting to the console to tweak the BIOS is a
bit of a bother.

Anyway, i7-4930K, turbo boost & hyperthreading disabled,
$ cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor
performance
performance
performance
performance
performance
performance

Oddly, though, CPU speed still seems to be fluctuating:
$ grep MHz /proc/cpuinfo
cpu MHz : 1255.875
cpu MHz : 3168.375
cpu MHz : 3062.125
cpu MHz : 1468.375
cpu MHz : 1309.000
cpu MHz : 2212.125
$ grep MHz /proc/cpuinfo
cpu MHz : 1255.875
cpu MHz : 2690.250
cpu MHz : 1255.875
cpu MHz : 2530.875
cpu MHz : 2212.125
cpu MHz : 1521.500

It does this even if I set scaling_min_freq to 3400000.
Very annoying. Should I be using a different
scaling_governor than intel_pstate?

>> It doesn't look like a slowdown; more like a 1% speedup.
>
> You will need to throw away the first few iterations of
> the test to account for cache warming effects.

You're absolutely right; that's exactly *why* I ran it 24 times and
listed them all separately. The "1%" number was B.S. and I was not
thinking when I quoted it.

What I had legitimately noticed was that the code with the patch took
slightly fewer cycles most of the time, even after discounting the
first few. Not statistically significant, but enough to argue that it
didn't cause a noticeable slowdown.


Anyway, two iterations each of "modprobe tcrypt mode=319".

Old code:
[ 1530.513529]
[ 1530.513529] testing speed of crc32c
[ 1530.513535] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 75 cycles/operation, 4 cycles/byte
[ 1530.513537] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 413 cycles/operation, 6 cycles/byte
[ 1530.513540] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 88 cycles/operation, 1 cycles/byte
[ 1530.513542] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 1327 cycles/operation, 5 cycles/byte
[ 1530.513548] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 503 cycles/operation, 1 cycles/byte
[ 1530.513551] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 178 cycles/operation, 0 cycles/byte
[ 1530.513553] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 4972 cycles/operation, 4 cycles/byte
[ 1530.513572] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 806 cycles/operation, 0 cycles/byte
[ 1530.513576] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 370 cycles/operation, 0 cycles/byte
[ 1530.513579] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 9835 cycles/operation, 4 cycles/byte
[ 1530.513615] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 1461 cycles/operation, 0 cycles/byte
[ 1530.513622] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 847 cycles/operation, 0 cycles/byte
[ 1530.513626] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 495 cycles/operation, 0 cycles/byte
[ 1530.513630] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 19571 cycles/operation, 4 cycles/byte
[ 1530.513700] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 2758 cycles/operation, 0 cycles/byte
[ 1530.513711] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 1676 cycles/operation, 0 cycles/byte
[ 1530.513718] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 859 cycles/operation, 0 cycles/byte
[ 1530.513722] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 39012 cycles/operation, 4 cycles/byte
[ 1530.513861] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 5417 cycles/operation, 0 cycles/byte
[ 1530.513882] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 3162 cycles/operation, 0 cycles/byte
[ 1530.513894] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 1678 cycles/operation, 0 cycles/byte
[ 1530.513901] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1653 cycles/operation, 0 cycles/byte

[ 1662.359717]
[ 1662.359717] testing speed of crc32c
[ 1662.359723] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 80 cycles/operation, 5 cycles/byte
[ 1662.359725] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 430 cycles/operation, 6 cycles/byte
[ 1662.359729] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 81 cycles/operation, 1 cycles/byte
[ 1662.359730] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 1324 cycles/operation, 5 cycles/byte
[ 1662.359736] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 503 cycles/operation, 1 cycles/byte
[ 1662.359740] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 171 cycles/operation, 0 cycles/byte
[ 1662.359741] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 4983 cycles/operation, 4 cycles/byte
[ 1662.359760] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 832 cycles/operation, 0 cycles/byte
[ 1662.359764] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 366 cycles/operation, 0 cycles/byte
[ 1662.359768] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 9839 cycles/operation, 4 cycles/byte
[ 1662.359804] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 1437 cycles/operation, 0 cycles/byte
[ 1662.359810] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 862 cycles/operation, 0 cycles/byte
[ 1662.359815] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 494 cycles/operation, 0 cycles/byte
[ 1662.359818] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 19553 cycles/operation, 4 cycles/byte
[ 1662.359901] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 2761 cycles/operation, 0 cycles/byte
[ 1662.359912] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 1715 cycles/operation, 0 cycles/byte
[ 1662.359919] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 852 cycles/operation, 0 cycles/byte
[ 1662.359928] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 39016 cycles/operation, 4 cycles/byte
[ 1662.360069] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 5538 cycles/operation, 0 cycles/byte
[ 1662.360090] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 3280 cycles/operation, 0 cycles/byte
[ 1662.360102] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 1695 cycles/operation, 0 cycles/byte
[ 1662.360110] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1639 cycles/operation, 0 cycles/byte

New code:
[ 710.814463]
[ 710.814463] testing speed of crc32c
[ 710.814469] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 80 cycles/operation, 5 cycles/byte
[ 710.814472] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 410 cycles/operation, 6 cycles/byte
[ 710.814476] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 94 cycles/operation, 1 cycles/byte
[ 710.814477] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 1327 cycles/operation, 5 cycles/byte
[ 710.814483] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 492 cycles/operation, 1 cycles/byte
[ 710.814486] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 175 cycles/operation, 0 cycles/byte
[ 710.814488] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 4970 cycles/operation, 4 cycles/byte
[ 710.814507] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 797 cycles/operation, 0 cycles/byte
[ 710.814511] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 370 cycles/operation, 0 cycles/byte
[ 710.814514] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 9846 cycles/operation, 4 cycles/byte
[ 710.814551] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 1452 cycles/operation, 0 cycles/byte
[ 710.814557] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 840 cycles/operation, 0 cycles/byte
[ 710.814561] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 497 cycles/operation, 0 cycles/byte
[ 710.814564] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 19563 cycles/operation, 4 cycles/byte
[ 710.814635] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 2764 cycles/operation, 0 cycles/byte
[ 710.814646] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 1646 cycles/operation, 0 cycles/byte
[ 710.814653] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 858 cycles/operation, 0 cycles/byte
[ 710.814657] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 39020 cycles/operation, 4 cycles/byte
[ 710.814796] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 5422 cycles/operation, 0 cycles/byte
[ 710.814816] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 3182 cycles/operation, 0 cycles/byte
[ 710.814829] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 1669 cycles/operation, 0 cycles/byte
[ 710.814836] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1636 cycles/operation, 0 cycles/byte

[ 1751.451733]
[ 1751.451733] testing speed of crc32c
[ 1751.451739] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 75 cycles/operation, 4 cycles/byte
[ 1751.451741] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 414 cycles/operation, 6 cycles/byte
[ 1751.451745] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 87 cycles/operation, 1 cycles/byte
[ 1751.451746] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 1329 cycles/operation, 5 cycles/byte
[ 1751.451752] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 499 cycles/operation, 1 cycles/byte
[ 1751.451756] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 170 cycles/operation, 0 cycles/byte
[ 1751.451757] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 4964 cycles/operation, 4 cycles/byte
[ 1751.451776] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 836 cycles/operation, 0 cycles/byte
[ 1751.451780] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 370 cycles/operation, 0 cycles/byte
[ 1751.451784] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 9844 cycles/operation, 4 cycles/byte
[ 1751.451820] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 1468 cycles/operation, 0 cycles/byte
[ 1751.451826] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 835 cycles/operation, 0 cycles/byte
[ 1751.451830] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 493 cycles/operation, 0 cycles/byte
[ 1751.451834] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 19564 cycles/operation, 4 cycles/byte
[ 1751.451904] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 2776 cycles/operation, 0 cycles/byte
[ 1751.451915] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 1662 cycles/operation, 0 cycles/byte
[ 1751.451922] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 858 cycles/operation, 0 cycles/byte
[ 1751.451927] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 39531 cycles/operation, 4 cycles/byte
[ 1751.452067] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 5427 cycles/operation, 0 cycles/byte
[ 1751.452088] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 3175 cycles/operation, 0 cycles/byte
[ 1751.452100] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 1666 cycles/operation, 0 cycles/byte
[ 1751.452107] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1634 cycles/operation, 0 cycles/byte

The tests are pretty short, but there's no obvious slowdown. Particularly
on the tests with > 200 byte per update where the modified code paths are
found.

Of course, whether the timing is valid is an interesting question.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/