On Thu, Nov 17, 2016 at 03:00:14PM -0500, Chris Metcalf wrote:
On 11/17/2016 4:53 AM, Peter Zijlstra wrote:Now, I don't quite remember things, but isn't it the idea to convert
On Wed, Nov 16, 2016 at 03:16:59PM -0500, Chris Metcalf wrote:I didn't notice that. It took me down an interesting rathole.
PeterZ (cc'ed) then improved it to use __int128 math viaWell, only if you set CONFIG_ARCH_SUPPORTS_INT128, otherwise it reduces
mul_u64_u32_shr(), but that doesn't help tile; we only do one multiply
instead of two, but the multiply is handled by an out-of-line call to
__multi3, and the sched_clock() function ends up about 2.5x slower as
a result.
to 2 32x23->64 multiplications, of which one if conditional on there
actually being bits set in the high word of the u64 argument.
Obviously the branch optimization won't help on cycle counter values,
since we blow out of the low 32 bits in the first few seconds of
uptime. So the conditional test won't help, but the 32x32
multiply optimizations should.
cycle deltas and accumulate in ns? That way you most always convert
small values.