Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

From: Mathias Krause
Date: Sun Aug 14 2011 - 15:06:55 EST

Hi Max,

2011/8/8 Locktyukhin, Maxim <maxim.locktyukhin@xxxxxxxxx>:
> I'd like to note that at Intel we very much appreciate Mathias effort to port/integrate this implementation into Linux kernel!
> $0.02 re tcrypt perf numbers below: I believe something must be terribly broken with the tcrypt measurements
> 20 (and more) cycles per byte shown below are not reasonable numbers for SHA-1 - ~6 c/b (as can be seen in some of the results for Core2) is the expected results ... so, while relative improvement seen is sort of consistent, the absolute performance numbers are very much off (and yes Sandy Bridge on AVX code is expected to be faster than Core2/SSSE3 - ~5.2 c/b vs. ~5.8 c/b on the level of the sha1_update() call to me more precise)
> this does not affect the proposed patch in any way, it looks like tcrypt's timing problem to me - I'd even venture a guess that it may be due to the use of RDTSC (that gets affected significantly by Turbo/EIST, TSC is isotropic in time but not with the core clock domain, i.e. RDTSC cannot be used to measure core cycles without at least disabling EIST and Turbo, or doing runtime adjustment of actual bus/core clock ratio vs. the standard ratio always used by TSC - I could elaborate more if someone is interested)

I found the Sandy Bridge numbers odd too but suspected, it might be
because of the laptop platform. The SSSE3 numbers on this platform
were slightly lower than the AVX numbers and that for still way off
the ones for the Core2 system. But your explanation fits well, too. It
might be EIST or Turbo mode that tampered with the numbers. Another,
maybe more likely point might be the overhead Andy mentioned.

> thanks again,
> -Max

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at