Re: [PATCH v2] crypto: crc32c-pclmul - Shrink K_table to 32-bit words

From: George Spelvin
Date: Wed May 28 2014 - 23:27:07 EST


> Can you do a tcrypt speed measurement with and without your changes?
> Check to see if there's any slowdown. Please make sure you pin
> the frequency of your cpu when running the test.
>
> e.g.
> echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

I just now re-read your e-mail and noticed you suggested a specific tool.
Oops, I haven't run that yet. I just made up my own in user space.
As I mentioned, since the changes are to the main loop that operates on
aligned buffers in multiples of 24 bytes, I focused my benchmarking there:

#define BUFFER 6114
static unsigned char buf[BUFFER] __attribute__ ((aligned(8)));
#define ITER 24 /* Number of test iterations */

uint32_t
do_test(uint32_t crc, uint32_t (*f)(void const *, unsigned, uint32_t))
{
int i, j;
for (i = 0; i < BUFFER; i += 8)
for (j = i+24; j <= BUFFER; j += 24)
crc = f(buf+i, j-i, crc);
return crc;
}

uint32_t
time_test(uint64_t *time, uint32_t crc, uint32_t (*f)(void const *, unsigned, ui
nt32_t))
{
uint64_t start = rdtsc();
crc = do_test(crc, f);
*time = rdtsc() - start;
return crc;
}

The actual test goes in ABBA order to reduce bias:

for (i = 0; i < ITER; i += 2) {
crc1 = time_test(times[i]+0, crc1, crc_pcl_1);
crc2 = time_test(times[i]+1, crc2, crc_pcl_2);
crc2 = time_test(times[i+1]+1, crc2, crc_pcl_2);
crc1 = time_test(times[i+1]+0, crc1, crc_pcl_1);
}

crc_pcl_1 is the old code, crc_pcl_2 is my revised version.


The results are as follows (the last line is a total):

Old code New code
0: 85009953 71812457 (-13197496)
1: 57408829 63361572 (+5952743)
2: 52552399 49195266 (-3357133)
3: 43595130 45988364 (+2393234)
4: 41541760 39714198 (-1827562)
5: 36576082 38021344 (+1445262)
6: 35307854 34150656 (-1157198)
7: 32182230 33134236 (+952006)
8: 31341596 31307004 (-34592)
9: 31340900 31329408 (-11492)
10: 31344884 31329144 (-15740)
11: 31334144 31312492 (-21652)
12: 31338992 31330356 (-8636)
13: 31343744 31311344 (-32400)
14: 31339000 31340196 (+1196)
15: 31337492 31313988 (-23504)
16: 31341688 31334040 (-7648)
17: 31341804 31308936 (-32868)
18: 31339936 31332020 (-7916)
19: 31323228 31324240 (+1012)
20: 31339744 31331768 (-7976)
21: 31321536 31332688 (+11152)
22: 31340280 31335212 (-5068)
23: 31332056 31335768 (+3712)
24: 885575261 876586697 (-8988564)

I swapped the link order of the two .o files in case cache
placement made a difference:

0: 84305981 71483150 (-12822831)
1: 57341376 63129024 (+5787648)
2: 52361618 49240069 (-3121549)
3: 43520576 45822670 (+2302094)
4: 41500104 39684116 (-1815988)
5: 36542864 37940196 (+1397332)
6: 35281570 34144348 (-1137222)
7: 32149420 33088652 (+939232)
8: 31342368 31329056 (-13312)
9: 31338788 31313212 (-25576)
10: 31336324 31335612 (-712)
11: 31341892 31319576 (-22316)
12: 31336224 31322808 (-13416)
13: 31338560 31315084 (-23476)
14: 31338332 31332976 (-5356)
15: 31337300 31315088 (-22212)
16: 31334300 31330884 (-3416)
17: 31318660 31329916 (+11256)
18: 31334984 31327740 (-7244)
19: 31315084 31327768 (+12684)
20: 31334708 31345872 (+11164)
21: 31325988 31330948 (+4960)
22: 31333956 31339800 (+5844)
23: 31322880 31327316 (+4436)
24: 884333857 875775881 (-8557976)

It doesn't look like a slowdown; more like a 1% speedup.

I'll figure out tcrypt in a bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/