Re: [PATCH v2] tile: avoid using clocksource_cyc2ns with absolute cycle count

From: Chris Metcalf
Date: Fri Nov 18 2016 - 11:59:43 EST


On 11/18/2016 5:34 AM, Peter Zijlstra wrote:
On Thu, Nov 17, 2016 at 03:00:14PM -0500, Chris Metcalf wrote:
On 11/17/2016 4:53 AM, Peter Zijlstra wrote:
On Wed, Nov 16, 2016 at 03:16:59PM -0500, Chris Metcalf wrote:
PeterZ (cc'ed) then improved it to use __int128 math via
mul_u64_u32_shr(), but that doesn't help tile; we only do one multiply
instead of two, but the multiply is handled by an out-of-line call to
__multi3, and the sched_clock() function ends up about 2.5x slower as
a result.
Well, only if you set CONFIG_ARCH_SUPPORTS_INT128, otherwise it reduces
to 2 32x23->64 multiplications, of which one if conditional on there
actually being bits set in the high word of the u64 argument.
I didn't notice that. It took me down an interesting rathole.

Obviously the branch optimization won't help on cycle counter values,
since we blow out of the low 32 bits in the first few seconds of
uptime. So the conditional test won't help, but the 32x32
multiply optimizations should.
Now, I don't quite remember things, but isn't it the idea to convert
cycle deltas and accumulate in ns? That way you most always convert
small values.

I would think you would also unnecessarily accumulate small errors.

The x86 sched_clock() seems to purely scale the current TSC value,
so what tile is doing is consistent with that, at least.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com